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Abstract 

This paper argues that curvelets provide a powerful tool for representing very gen- 
eral linear symmetric systems of hyperbolic differential equations. Curvelets are a 
recently developed multiscale system E] in which the elements are highly anisotropic 
at fine scales, with effective support shaped according to the parabolic scaling principle 
width « length 2 at fine scales. We prove that for a wide class of linear hyperbolic differ- 
ential equations, the curvelet representation of the solution operator is both optimally 
sparse and well organized. 

• It is sparse in the sense that the matrix entries decay nearly exponentially fast 
(i.e. faster than any negative polynomial), 

• and well-organized in the sense that the very few nonnegligible entries occur near 
a shifted diagonal. 

Indeed, we actually show that the action of the wave-group on a curvelet is well- 
approximated by simply translating the center of the curvelet along the Hamiltonian 
flow — hence the diagonal shift in the curvelet representation. A physical interpretation 
of this result is that curvelets may be viewed as coherent waveforms with enough fre- 
quency localization so that they behave like waves but at the same time, with enough 
spatial localization so that they simultaneously behave like particles. 
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1 Introduction 



This paper is concerned with the representation of symmetric systems of linear hyperbolic 
differential equations of the form 

-gj; + 22Mx)-Q^- k +B(x)u = 0, u(0,x) =u (x), (1.1) 

k 

where u is an m-dimensional vector and x G W 1 . The matrices and B may depend on the 
spatial variable x, and the A^s are symmetric. Linear hyperbolic systems are ubiquitous 
in the sciences and a classical example are the equations for acoustic waves which read 



fe + V-GM =0 

Pof +V( C gp) =0, 



dt v " VFU«7 — « (-, «\ 



where it and p are the velocity and density disturbances respectively. (Here, po = po(x) 
is the density and cq = cq(x) is the speed of sound.) Other well-known examples include 
Maxwell's equations of electrodynamics and the equations of linear elasticity. All the results 
presented in this paper equally apply to higher-order scalar wave equations, e.g., of the form 

d 2 u d 2 u du 

-Q^-}^ ai ^ x > dxdx ' u(0,x) =u (x), —(0,x)= Ul (x), 

ij i k 

(where u is now a scalar and ciij(x) is taken to be symmetric and positive definite) as it is 
well-known that such single second-order equations can be reduced to a symmetric system 
of first-order equations fjl . 1|) by appropriate changes of variables. 

1.1 About representations 

We are interested in representations of the solution operator E(t) to the system p.lj) 

u(t, •) = E(t)u , 

which may be expressed as an integral involving the so-called Green's function 

u(t,x)= / E(t;x,y)u (y)dy. 



To introduce the concept of representation, suppose that the coefficient matrices do not 
depend on x. As is well-known, the Fourier transform is a powerful tool for studying 
differential equations in this setting. Indeed, in the Fourier domain, (|1 . 1|) takes the form 

(d t + iJ2Mk + B)u(t,0 = 0; 

k 

in short, (jl.lj) reduces to a system of ordinary differential equations which can be solved 
analytically. This shows the power of the representation; in the frequency domain, the 
solution operator is diagonal and the study becomes ridiculously simple. 

These desirable properties are very fragile, however. Both mathematicians and com- 
putational scientists know that Fourier methods are not really amenable to differential 
equations with variable coefficients, and that we need to find alternatives. Instead of con- 
sidering the evolution of Fourier coefficients, we may want to think, instead, of the action 
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of the propagation operator E(t) on other types of basis elements. This connects with the 
viewpoint of modern harmonic analysis whose goal is to develop representations, e.g. an 
orthonormal basis (f n ) of L 2 (M n ), say, in which the solution operator 



is as simple as possible; that is, such that E(t)f n is a sparse superposition of those elements 
f n '. Such sparse representations are extremely significant both in mathematical analysis, 
where sparsity allows for sharper inequalities and in numerical applications where sparsity 
allows for faster algorithms. 

• In the field of mathematical analysis, for example, Calderon introduced what one 
would nowadays call the Continuous Wavelet Transform (CWT) in which objects are 
represented as a superpositions of simple elements of the form ift((x — b)/a), with 
a > and b € R n ; i.e, objects are represented as a superposition of dilates and 
translates of a single function ip. These elements proved to be almost eigenfunctions 
of large classes of operators, the Calderon-Zygmund operators which are special types 
of singular integrals, some of which arising in connection with elliptic problems. It 
was later gradually realized that tools like atomic decompositions of Hardy spaces 
|17l I29| and orthonormal bases of Wavelets |22l I23| provide a setting in which some 
aspects of the mathematical analysis of these operators is dramatically eased. 

• Clever representation of scientific and engineering computations can make previously 
intractable computations tractable. Here, sparsity may allow the design of fast ma- 
trix multiplication and/or fast matrix inversion algorithms. For example, Beylkin, 
Coifman and Rokhlin [2] exploited the sparsity of those singular integrals mentioned 
above, and showed how to use wavelet bases to compute such integrals with very low 
complexity algorithms. 

In short, a single representation, namely, the wavelet transform provides sparse decompo- 
sitions of large classes of operators simultaneously. 

1.2 Limitations of Classical Multiscale Ideas 

Our goal in this paper is to find a representation which provides sparse representations of the 
solution operators to fairly general classes of systems of hyperbolic differential equations. 
Now the last two decades have seen the widespread development of multiscale ideas such 
as Multigrid, Fast Multipole Methods, Wavelets, Finite Elements with or without adaptive 
refinement, etc. All these representations propose dictionaries of roughly isotropic elements 
occurring at all scales and locations; the templates are rescaled treating all directions in 
essentially the same way. Isotropic scaling may be successful when the object under study 
does not exhibit any special features along selected orientations. This is the exception 
rather than the rule. 

Tools from traditional multiscale analysis are very powerful for representing certain 
elliptic problems but unfortunately, they are definitely ill-adapted to hyperbolic problems 
such as Indeed, 

1. they fail to sparsify the wave propagation, i.e. the solution operator E(t), 

2. and they fail to provide a sparse representation of oscillatory signals which are the 
solutions of those equations. 




(1.3) 
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To make things concrete, consider the problem of propagating elastic waves as in geophysics. 
Consider the scalar wave equation in two dimensions 

d tt u = c 2 (x)Au, (1.4) 

where A is the Laplacian defined by A = d 2 jdx\ + d 2 /dx?, (we may take the velocity field to 
be constant). To describe the action of the wave group, we assume that the initial condition 
takes the form of a wavelet with vanishing initial velocity, say. Then it is clear that at a 
later time, the wavefield is composed of large concentric rings (imagine throwing a stone 
in a lake). Now, it is also clear that many wavelets are needed to represent the wavefield. 
In other words, the wavefield is a rather dense superposition of wavelets. Note that this 
may be quantified. Suppose that the velocity field is identically equal to one, say, and that 
the initial condition is a wavelet at scale 2 _ - ? ; that is, of the form 2^ l 2 "^^? x) so that in 
frequency, the energy is concentrated near the dyadic annulus |£| ~ 2 3 . Then one would 
need at least 0(2 3 ) wavelets to reconstruct the wavefield at time t = 1 to within reasonable 
accuracy. 

Our simple example above shows wave-like flows do not preserve the geometry and 
characteristics of classical multiscale systems. To achieve sparsity, we need to rethink the 
geometry of multiscale representations. 

1.3 A New Form of Multiscale Analysis 

As we will see in section |2 curvelets are waveforms which are highly anisotropic at fine 
scales, with effective support obeying the parabolic principle length ~ width 2 . Just as 
for wavelets, there is both a continuous and a discrete curvelet transform. A curvelet is 
indexed by three parameters which — adopting a continuous description of the parameter 
space — are: a scale a, < a < 1; an orientation 8, 8 £ [— ir/2, vr/2) and a location b, b G M 2 . 
At scale a, the family of curvelets is generated by translation and rotation of a basic element 

l Pa,b,e( x ) = <Pa(Re(x - b)). 

Here, (p a (x) is some kind of directional wavelet with spatial width ~ a and spatial length 
~ yfa, and with minor axis pointing in the horizontal direction 

ip a {x) « ip(D a x), D a = 

D a is a parabolic scaling matrix, Rg is a rotation by 8 radians. 

An important property is that curvelets obey the principle of harmonic of analysis 
which says that it is possible to analyze and reconstruct an arbitrary function f(x\,X2) 
as a superposition of such templates. It is possible to construct tight-frames of curvelets 
and one can, indeed, easily expand an arbitrary function f{x\,x%) as a series of curvelets, 
much like in an orthonormal basis. Continuing at an informal level of exposition, there is 
a sampling of the plane (a, b, 8) 

aj = 2~ j , 8 U = 2tt£ ■ 2~Li/2J ^ R dj /^ = (fc l2 -J, k 2 2~ j / 2 ), 
such that with u indexing the triples (aj, 8j £, b^'^) the collection ip^ is a tight-frame: 

/ = E</>^>/" 11/112 = Ek/>^>i 2 - ( L5 ) 



i/a o y 

I 1/y/El ' 
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Spatial viewpoint Frequency viewpoint 

Figure 1: Schematic representation of the support of a curvelet in both space and frequency. 
In the spatial domain, a curvelet has an envelope strongly aligned along a specified 'ridge' 
while in the frequency domain, it is supported near a box whose orientation is aligned with 
the co-direction of the ridge. 

(Note that these formulae allow us to analyze and synthesize arbitrary functions in L 2 (IR 2 ) 
as a superposition of curvelets in a stable and concrete way.) 

As we have seen, a curvelet is well-localized in space but it is also well-localized in 
frequency. Recall that a given scale, curvelets ip^ are obtained by applying shifts and 
rotations to a 'mother' curvelet ¥?j,o,o- I n the frequency domain then 

0j,oAO = 2~ 3 ^ 4 W(2^\^\)V(2^9). 

Here, W, V are smooth windows compactly supported near the intervals [1, 2] and [—1/2, 1/2] 
respectively. Whereas in the spatial domain curvelets live near an oriented rectangle R of 
length 2~ J / 2 and width 2 _J , in the frequency domain, they are located in a parabolic wedge 
of length 2 J and width 2 J//2 and whose orientation is orthogonal to that of R. The joint 
localization in both space allows us to think about curvelets as occupying a 'Heisenberg 
cell' in phase-space with parabolic scaling in both domains. Figure ^ offers a schematic 
representation of this joint localization. As we shall see, this microlocal behavior is key 
to understanding the properties of curvelet-propagation. Additional details are given in 
section |2 

1.4 Curvelets and Geometrical Optics 

A hyperbolic system can typically be considered in the approximation of high-frequency 
waves, also known as geometrical optics. In order to best describe our main result, it is 
perhaps suitable first to exhibit the connections between curvelets and geometrical optics. 
In that setting it is not necessary to describe the dynamics in terms of the wavefield u(t,x). 
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Only its prominent features are studied: wave fronts, or equivalently rays. The latter are 
trajectories (x(i),£(i)) in phase-space K 2 x IR 2 , and are the solutions to the m Hamiltonian 
flows (indexed by u) 



x{t) =V 5 A°(:r,0, x(0) =x . 



o/_ ^ *™ _ t (1-6) 



The system 1)1.60 is also called the bicharacteristic flow and the rays (2; (£),£(<)) the bichar- 
acteristics. To see how this system arises, consider the classical high-frequency wave- 
propagation approximation to the wavefield u(x,t) of the form 



i(x,t) = J2^ u{x ' t) (^,oOM) + 

1J ^ 



where w is a large parameter; it then follows (after substituting the approximation in the 
wave equation 1)1.10 ) that the phase functions $ u must obey the eikonal equations 

d t $ v + \° u (x,V x $) = 0, (1.7) 

and that the amplitudes must obey 'transport' equations we shall not detail here (see section 
0J. In the above expression (and in the Hamiltonian equations 1)1.60 ). the A°(x,£) are the 
eigenvalues of the dispersion matrix 

a°(x,0 = J2M^r (1-8) 

k 

It is well-known that the Hamiltonian equations describe the evolution of the wavefront set 
of the solution as $> v (t,x(t)) is actually constant along the z/th Hamiltonian flow (|1.6|) . 

With this background, we are now in a position to qualitatively describe the behavior 
of the wave-propagation operator E(t) acting on a curvelet ip^. However, we first need 
to introduce a notion of vector- valued curvelet since E(t) is acting on vector fields. Let 
r i>( x >0 De the eigenvector of the dispersion matrix associated with the eigenvalue A^(x,£). 
We then define hyper-curvelets by 

= (2^)2 / e^r° u (x,OMOd!i. (1.9) 



Later in this section, we will motivate this special choice but for now simply observe that 
ippv is a vector-valued waveform. 

Consider then the solution to the wave equation tp^(t,x) with initial value tp^(x). 
Then we claim that 



the action of the wave group on a hyper- curvelet is well- approximated by simply 
translating the center of the curvelet along the corresponding Hamiltonian flow. 

To examine this claim, set n(t) = £(£)/|£(i)| and consider the reduced Hamiltonian flow 
obtained by adding the equation 

U(t) = -U(n ® V x Xl(x, n) - X° v (x, n) tg> n) (1.10) 

to the system p. 60 . Here, U(t) is the rotation matrix tracking the orientation rotation 
of £(i) in the sense that at all times U(t)n(t) = n(0) = £(0)/|f(0)|, or n(t) = U(t)*n{0). 



C 



Figure 2: Schematic representation of the action of the wave group on a hyper-curvelet. 
The solution is well-approximated by rigid motion along the Hamiltonian flow. 

Our claim says that the solution to the wave equation nearly follows the dynamics of the 
reduced Hamiltonian flow, i.e. 

<p$(t,x) « p$(U,(t)(x - s M (i)) + z M ) (1.11) 

where x^ft) and Up(t) are the rays of (jl.lUj) with initial starting points 3^(0) = x^ and 
Ufj,(0) = Id. Figure 13 illustrates this fact. 

We now return to the interpretation of a hyper-curvelet. Suppose that r£ only depends 
on £ as in the case of the acoustic system (|1.2|) 

*0 = f^. 4«) = ^(T')- 

(Here and below, £ denotes the vector obtained from £ after applying a rotation by 90 
degrees). In this special case, we see that the hyper-curvelet is obtained by multiplying — 
in the frequency domain — a scalar-valued curvelet with the eigenvectors of the dispersion 
matrix 

This is useful for the curvelet tp^u will essentially follow only one flow, namely, the z/th 
flow. Suppose we had started, instead, with an initial value of the form ip^ v = c^e,,, where 
e u is the canonical basis of M 3 , say. Then our curvelet would have interacted with the three 
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eigenvectors of the dispersion matrix, and would have 'split' and followed the three distinct 
flows. By forcing ip^}, (£) to be aligned with r°(£), we essentially removed the components 
associated with the other flows. In the general case 1)1.9)1 . we build hyper-curvelets by 
applying R®, which is now a pseudo-differential operator with symbol r®(x,(;), mapping 
scalars to m-dimensional vectors, and independent of time. The effect is of course the 
same. 

Note that when r° is independent of x, hyper-curvelets build-up a (vector-valued) tight- 
frame; letting [F, G] be the usual inner product over three-dimensional vector fields in 
L 2 (M 2 ), the family ((p^u)^ obeys the reconstruction formula 

and the Parseval relation 

M! 2 =£lh¥$]| 2 . (1.13) 

Just as one can decompose a scalar field as a superposition of scalar curvelets, one can 
analyze and synthesize any wavefield as a superposition of hyper-curvelets in a stable and 
concrete way. For arbitrary r®(x,£), this is, however, in general not true. 

We would like to emphasize that although the Eikonal equations only have solutions 
for small times, the approximation 1)1. 11|) and, more generally, all of our results are valid 
for all times since the rays ()1.6)) are always well-defined, see section 1.6 below for a more 
detailed discussion. 



1.5 Curvelets and Hyperbolic Systems 

The previous section gave a qualitative description of the action of the wave group on a 
curvelet and we we shall now quantify this fact. The evolution operator E(t) acting on a 
curvelet (p^v is of course not exactly another curvelet which occurs at a displaced 

location and orientation. Instead, it is a superposition of curvelets ct^ipffl sucn that 

1. the coefficients (a^) decay nearly exponentially, 

2. and the significant coefficients of this expansion are all located at indices (//, v) 'near' 
(/io(t), vq). By near, we mean nearby scales, orientations and locations. 

To state the key result of this paper, we need a notion of distance u between curvelet 
indices which will be formally introduced in sectional Crudely, u(fA, fjf) is small if and only 
if both curvelets are at roughly the same scale, have similar orientation and are at nearby 
spatial locations. In the same spirit, the distance w(//, //) increases as the distance between 
the scale, angular, and location parameters increases. 

For each fi = (j, k, £) and v = 1, . . . , m, define the vector- valued curvelets 

where e„ is the ^th canonical basis vector in M m . The <£w's inherit the tight-frame property 
Ql. 12)1 - 1)1. 13)1 . We would like to again remind the reader that these vector- valued curvelets 
are simpler and different from the hyper-curvelets (pffi defined in the previous section. 
Consider now the representing the operator E(t) in a tight-frame of vector- valued curvelets, 
namely, 

E(t; n, v- fi\ v') = (cp^, E^cp^). (1.15) 
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We will refer to E(t; fx, v; fx! , u') or simply E as the curvelet matrix of E(t), with row index 
fx, v and column index fx', v' . Decompose the initial wavefield uq = v c^ip^. Then one 
can express the action of E(t) on no in the curvelet domain as 

E{t)u = c l w{t)ip i Lw i Cp,(t) = E(t; fx', v'; fx, v)^^ 

ia> fi',u' 

with convergence in L 2 (M. 2 ,C m ). In short, the curvelet matrix maps the curvelet coefficients 
of the initial wavefield uq(-) into those of the solution u(t, •) at time t. 

Theorem 1.1. Suppose that the coefficients A^{x) and B(x) of the hyperbolic system, are 
C°° and that the multiplicity of the eigenvalues of the dispersion matrix Yl,kAk{x)£,k is 
constant in x and £. Then 

• The matrix E is sparse. Suppose a is either a row or a column of E, and let |a|( n ) be 
the n-largest entry of the sequence \a\, then for each M > 0, \a\t n ) obeys 

M(n) < C tM ■ n~ M . (1.16) 

• The matrix E is well-organized. For each N > 0, the coefficients obey 

m 

\E{t;fx,v;fx',v')\<C tN - cj(fi, (x' u „(t)y N . (1.17) 

v"=l 

Here fx v (t) is the curvelet index fx flown along the vth Hamiltonian system. 

Both constants CtM and CtN grow in time at most like C\e C2t for some C\, C2 > depend- 
ing on M , resp. N . 

In effect, the curvelet matrix of the solution operator resembles a sum of m permutation 
matrices where m is the order of the hyperbolic system; first, there are significant coefficients 
along m shifted diagonal and second, coefficients away from these diagonals decay nearly 
exponentially; i.e. faster than any negative polynomial. Now just as wavelets provide 
sparse representations to the solution operators to certain elliptic differential equations, 
our theorem shows that curvelets provide an optimally sparse representation of solution 
operators to systems of symmetric hyperbolic equations. 

We can also resort to hyper-curvelets as defined in the previous section and formulate 
a related result where the curvelet matrix is sparse around a single shifted diagonal. This 
refinement approximately decouples the evolution into polarized components and will be 
made precise later. 

To grasp the implications of Theorem ll.il consider the following corollary: 

Corollary 1.2. Consider the truncated operator As obtained by keeping m ■ B elements 
per row — the B closest to each shifted diagonal in the sense of the pseudo- distance u. Then 
the truncated matrix obeys 

\\A-A B \\ L 2^ L 2 <C M -B~ M , (1.18) 

for each M > 0. 

The proof follows from that of Theorem 11.11 by an application of Schur's lemma and 
is omitted. Hence, whereas the Fourier or wavelet representations are dense, curvelets 
faithfully model the geometry of wave propagation as only a few terms are needed to 
represent the action of the wave group accurately. 
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1.6 Strategy 

In his seminal paper ^S], Lax constructed approximate solution operators to linear and 
symmetric hyperbolic systems, also known as parametrices. He showed that these paramet- 
rices are oscillatory integrals in the frequency domain which are commonly referred to as 
Fourier integral operators (FIO) (the development and study of FIOs is motivated by the 
connection). An operator T is said to be an FIO if it is of the form 

Tf(x) = J e^^a(x,0f(0dt. (1.19) 

We suppose the phase function $ and the amplitude a obey the following standard assump- 
tions |2H]: 

• the phase $(x,£) is C°°, homogeneous of degree 1 in £, i.e. $(x,\£) = \<f>(x,£) for 
A > 0, and with = V X V^, obeys the nondegeneracy condition 

|det^ xf (x,OI>c>0, (1.20) 

uniformly in x and £; 

• the amplitude a is a symbol of order m, which means that a is C°°, and obeys 

\&^dla{x,i)\ <C Q/3 (l + |£|r-H. (1.21) 

Lax's insight is that the solution of the initial value problem for a variable coefficient 
hyperbolic system can be well approximated by a superposition of integrals of the form 
()1.19j) with matrix-valued amplitudes of order 0. The phases of these FIO's are, of course, 
those solving the Eikonal equations (|1.7|) . Hence, a substantial part of our argument will be 
about proving that curvelets sparsify FIO's. Now an important aspect of this construction 
is that this approximation is only valid for small times whereas our theorem is valid for all 
times. The reason is that the solutions to the Eikonal equations (jl.7|) are not expected to be 
global in time, because & u would become multi- valued when rays originating from the same 
point xq cross at a later time. This typically happens at cusp points, when caustics start 
developing. We refer the reader to ^JESl- Because, we are interested in a statement valid 
for all times, we need to bootstrap the construction of the FIO parametrix by composing 
the small time FIO parametrix with itself. Now this creates an additional difficulty. Each 
parametrix convects a curvelet along m flows, and we see that after each composition, the 
number of curvelets would be multiplied by m, see section 4.1 for a proper discussion. This 
would lead to matrices with poor concentration properties. Therefore, the other part of the 
argument consists in decoupling the equations so that this phenomenon does not occur. In 
summary, the general architecture of the proof of Theorem II. II is as follows: 

• We first decompose the wave-field into m-one way components, i.e. components which 
essentially travel along only one flow. We show that this decomposition is sparse in 
tight-frames of curvelets. 

• Second, we show that curvelet representations of FIO's are optimally sparse in tight- 
frame of curvelets, a result of independent interest. 

As a side remark, we would like to point out that the result about optimally sparse 
representations of FIO's was announced without a proof in the companion paper This 
paper, however, gives the first proof of this optimality result. 
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1.7 Inspiration and Relation to Other Work 

Underlying our results is a mathematical insight concerning the central role for the anal- 
ysis of hyperbolic differential equations, played by the parabolic scaling, in which analysis 
elements are supported in elongated regions obeying the relation width ~ length 2 . In fact, 
curvelets imply the same tiling of the frequency plane as the Second Dyadic Decomposition 
(SDD), a construction introduced in the seventies by Stein and Fefferman |15M29| . originally 
for the purpose of understanding boundedness of Riesz spherical means, and later widely 
adapted to the study of various Fourier Integral Operators. More specifically, we would like 
to single out the work of Hart Smith with which we became familiar while working on this 
project. Smith [2Sj used parabolic scaling to define function spaces preserved by Fourier 
Integral Operators and to analyze the behavior of wave equations with low-regularity 
coefficients j22j. The latter reference actually develops curvelet-like systems which provide 
a powerful tool to derive so-called sharp Strichartz estimates for solutions to such equations 
in space dimensions d = 2, 3 (a Strichartz estimate is a bound on the norm of the solution in 
some appropriate functional space, e.g. L p ). We find the connection with the work of Smith 
especially stimulating. From a broader viewpoint, the literature on the subject indicates 
that curvelets are in some sense compatible with a long tradition in harmonic analysis. 

The fact that the action of a FIO should be seen as a 'rearrangement of wave packets' 
was discovered by Cordoba and Fefferman in their visionary paper ^01- They show how 
simple proofs of I? boundedness and the Garding inequality follow in a straightforward way 
from the decomposition into wave packets. Without rotations and the parabolic scaling, 
however, their estimates are not sharp and do not completely capture the geometry of 
FIO's. 

Next, there is of course the inspiration of modern computational harmonic analysis 
(CHA) whose agenda is the development of orthobases, tight-frames, which are 'optimal' 
for representing objects (operators, functions) of scientific interest together with rapid 
algorithms to compute such representations. The point of view here is to develop new 
mathematical ideas and and turn these ideas into effective algorithms and these effective 
algorithms into effective and targeted applications. At the beginning of this introduction, 
we mentioned an instance of this scientific vision: (1) wavelets provide sparse representa- 
tions of objects with punctuated smoothness and of large classes of singular integrals and 
other pseudo-differential operators |22[ I23j : (2) there are fast discrete wavelet transforms 
operating in 0(N) for a signal of length N |21j : (3) this creates an opportunity for targeted 
applications in signal processing where wavelets allow for better compression scien- 
tific computing where wavelets allow for faster algorithms |2| , and for statistical estimation 
where wavelets allow for sharper reconstructions This vision was perhaps championed 
in [2] where (l)-(3) were combined to demonstrate how one can use the wavelet transform 
to compute certain types of singular integrals in a number of operations of the order of 
C(e) • NlogN where C(e) is a constant depending upon the desired accuracy e. 

1.8 Significance 

We would like to mention how we see our work fit with the vision described above. 

• Curvelets and wavefronts. Curvelets are ideal for representing wavefront phenomena 
UJ, or objects which display curve-punctuated smoothness — smoothness except for 
discontinuity along a general curve with bounded curvature [3J Ej • This fact originally 
motivated their construction jUIS]- For example, [2] established that curvelets provide 
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the sparsest representations of functions which are C 2 away from piecewise C 2 edges. 
Such representations are nearly as sparse as if the object were not singular and turn 
out to be far more sparse than the wavelet decomposition of the object. 

Hence, we see that curvelets provide the unique opportunity for having a representa- 
tion giving enhanced sparsity of wave groups, and simultaneously of the solution space. 
We believe that this will eventually be of great practical significance for applications 
in fields which are great consumers of these mathematical models, e.g. seismics. 

• New ideas for new numerical solvers. Clearly, Theorem 11.11 may serve as a basis for 
faster geometric multiscale PDE solvers. In fact, this paper is the first of a projected 
series showing how one can exploit the structure of the curvelet transform and the 
enhanced sparsity of wave groups to derive new numerical low-complexity algorithms 
for accurately computing the solution to large classes of differential equations, see the 
concluding section for a discussion. 

• Digital curvelet transforms. In order to deploy curvelet-like ideas in practical appli- 
cations, one would need a digital notion of curvelet transform which (1) would be 
rapidly computable and (2) would be geometrically faithful in the sense that one 
would want an accurate digital analog of the corresponding geometric ideas defined 
at the level of the continuum. There actually is progress on this front. Donoho and 
one of the authors recently proposed a Digital Curvelet Transform via Unequispaced 
Fast Fourier Transforms (DCTvUSFFT) |7j. The DCTvUSFFT is a fast algorithm 
which allows analysis and synthesis of Cartesian arrays as superpositions of discrete 
curvelets; for practical purposes, the algorithm runs in 0(N log N) operations for 
input array of size N. Digital curvelets obey sharp frequency and spatial localization. 

In short, this paper is an essential piece of a much larger body of work. 
1.9 Contents 

Section 2 below reviews the construction of Curvelets. Section 3 below examines second- 
order scalar hyperbolic equations and gives a heuristic indicating why the sparsity may 
be expected to hold. Section 4 links our main result with properties of FIOs. Section 5 
proves that FIO's are optimally sparse in scalar curvelet tight-frames. Section 6 discusses 
implications of this work, namely, in the area of scientific computing. Finally, proofs of key 
estimates supporting our main result are given in Section 7. 

2 Curvelets 

This section briefly introduces tight frames of curvelets, see [2] for more details. 
2.1 Definition 

We work throughout in R 2 , with spatial variable x, with £ a frequency-domain variable, and 
with r and 9 polar coordinates in the frequency-domain. We start with a pair of windows 
W(r) and V(t), which we will call the 'radial window' and 'angular window', respectively. 
These are both smooth, nonnegative and real- valued, with W taking positive real arguments 



12 



and supported on r £ [1/2, 2] and V taking real arguments and supported on t £ [—1, 1]. 
These windows will always obey the admissibility conditions: 

oo 

W 2 (2 j r) = 1, r > 0; (2.1) 

j=—oo 
oo 

V 2 (t-£) = 1, teR. (2.2) 

^=-oo 

Now, for each j > jo, we introduce ipj{x\,X2) defined in the Fourier domain by 

fyfc) = 2~ 3j / 4 W(2- j \£\) ■ V{2^^9) (2.3) 

Thus the support of ipj is a polar 'wedge' defined by the support of W and V, the radial 
and angular windows, applied with scale-dependent window widths in each direction. 

We may think of <pj as a "mother" curvelet at scale 2~ 3 in the sense that all curvelets 
at that scaled are obtained by rotations and translations of ipj. Introduce 

• the equispaced sequence of rotation angles Ojj = 2tt ■ 2~Li/ 2 J ■ £, < £ < Lj = 2u' 2 i ; 

• and the sequence of translation parameters k = (k±, k 2 ) 6 Z 2 . 

With these notations, we define curvelets (as function of x = (xi,^)) a t scale 2~ 3 , orien- 
tation 9j t £ and position o£ = R$ jt (ki ■ 2 _J /5i, &2 ■ 2~^ 2 j8-i) for some adequate constants 
5i,8 2 by 

As in wavelet theory, we also have coarse scale elements. We introduce the low-pass 
window Wo obeying 

\W (r)\ 2 + J2\W(2- j r)\ 2 = l, 

j>o 

and for k\,k 2 £ Z, define coarse scale curvelets as 

= ~ ^ j0 k), = 2-j°W (2-^\). 

Hence, coarse scale curvelets are nondirectional. The 'full' curvelet transform consists of 
the fine-scale directional elements (tf>j t, k)j>jo t k an d of the coarse-scale isotropic father 
wavelets (&j ,k)k- For our purposes, it is the behavior of the fine-scale directional elements 
that matters. 

In the remainder of the paper, we will use the generic notation (y^)^gM to index the 
elements of the curvelet tight-frame. The dyadic-parabolic subscript [i stands for the triplet 
(j, k,£). We will also make use of the convenient notations 

• = oi is the center of tp^ in space. 

• Op = 9j t £ is the orientation of ip^ with respect to the vertical axis in x. 

• £ At = (2 J cos Op, 2 3 sin0^) is the center of (p^ in frequency. 

• e M = Cm/ I Cm I indicates the codirection of p^. 
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2.2 Properties 



We now list a few properties of the curvelet transform which will play an important role 
throughout the remainder of this paper. 

1. Tight-frame. Much like in an orthonormal basis, we can easily expand an arbitrary 
function f{x\,x%) £ L 2 (M 2 ) as a series of curvelets: we have a reconstruction formula 

with equality holding in an L 2 sense; and a Parseval relation 

2. Parabolic scaling. The frequency localization of ipj implies the following spatial 
structure: (fj(x) is of rapid decay away from an 2 _J by 2~'^ 2 rectangle with minor 
axis pointing in the horizontal direction. In short, the effective length and width obey 
the anisotropy scaling relation 

length « 2~ j / 2 , width « 2~ j width « length 2 . (2.4) 

3. Oscillatory behavior. As is apparent from its definition, tpj is actually supported 
away from the vertical axis £i = but near the horizontal £2 = axis. In a nutshell, 
this says that <fj(x) is oscillatory in the xi-direction and lowpass in the ^-direction. 
Hence, at scale 2~ 3 , a curvelet is a little needle whose envelope is a specified 'ridge' 
of effective length 2~ 3 '/ 2 and width 2~i , and which displays an oscillatory behavior 
across the main 'ridge'. 

4. Phase-Space Tiling/Sampling. We can really think about curvelets as Heisenberg 
tiles of minimum volume in phase-space. In x, the essential support of (p^ has size 
0(2~ 3 x 2 -J ' 2 ). In frequency, the support of (p^ has size 0(2? I 2 x2P). The net volume 
in phase-space is therefore 

0{2~i x 2~H 2 ) • 0{2 j > 2 x 2 j ) = O(l), 

which is in accordance with the uncertainty principle. The parameters (J, k, £) of the 
curvelet transform induce a new non-trivial sampling of phase-space, Cartesian in x, 
polar in £, and based on the parabolic scaling. 

5. Complex- valuedness. Since curvelets do not obey the symmetry <Pu,(—Q = 

(fn is complex-valued. There exists a related construction for real-valued curvelets 
by simply symmetrizing the construction, see 0. The complex- valued transform is 
better adapted to the purpose of this paper. 

Figure |31 summarizes the key components of the construction. 
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Figure 3: Curvelet tiling of Phase-Space. The figure on the left represents the sampling in 
the frequency plane. In the frequency domain, curvelets are supported near a 'parabolic' 
wedge. The shaded area represents such a generic wedge. The figure on the right schemat- 
ically represents the spatial Cartesian grid associated with a given scale and orientation. 

2.3 Curvelet Molecules 

We introduce the notion of curvelet molecule; our objective, here, is to encompass under this 
name a wide collection of systems which share the same essential properties as the curvelets 
we have just introduced. Our formulation is inspired by the notion of 'vaguelettes' in 
wavelet analysis j^SJ. Our motivation for introducing this concept is the fact that operators 
of interest do not map curvelets into curvelets, but rather into these molecules. Note that 
the terminology 'molecule' is somewhat standard in the literature of harmonic analysis |17) . 

Definition 2.1. A family of functions (m^)^ is said to be a family of curvelet molecules 
with regularity R if (for j > 0) they may be expressed as 

m^(x)=2 3 ^ 4 a^ (D 2 -jRe ll x - k') , 

where k' = ^p-) and where for all /i, the a" 's verify the following properties: 

• Smoothness and spatial localization: for each \f3\ < R, and each M = 0, 1, 2 . . ., there 
is a constant Cm > such that 

\d^\x)\<C M -(l + \x\)- M . (2.5) 

• Nearly vanishing moments: for each N = 0, 1, . . . , R, there is a constant CV > such 
that 

|a^(0| <C N -min(l,2-J + + 2- i/2 M) N . (2-6) 

Here, the constants may be chosen independently of /i so that the above inequalities hold 
uniformly over fi. There is of course an obvious modification for the coarse scale molecules 
which are of the form a^\x — k') with as in \2.5\) . 
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This definition implies a series of useful estimates. For instance, consider 9^ = so that 
Rg^ is the identity (arbitrary molecules are obtained by rotations). Then, m M obeys 

K(s)| < C M ■ 2 3 ^ ■ (l + \7?x x - || + |2^/2 X2 - ||) M (2.7) 
for each M > and \(3\ < R, and similarly for its derivatives 

\d?m,(x)\ < Cm ■ 2 3 ^ 4 • 2^+^ ■ (l + \2> Xl - ^| + \V' 2 x 2 - ^|V^ . (2.8) 

Another useful property is the almost vanishing moments property which says that in the 
frequency plane, a molecule is localized near the dyadic corona {2 J < |£| < 2 J+1 }; 
obeys 

\m^)\ <C N - 2- 3 ^. min (i ) 2^(l + |£|)) Ar , (2.9) 
which is valid for every N < R, which gives the frequency localization 

\m l *(£)\<C N -2-W*-\S lt (S)\ N (2.10) 

where for fiQ = (j, 0, 0), 

= min(l, 2^(1 + |£|)) • (1 + |2"^i| + ^t^)- 1 . (2.11) 

For arbitrary [/,, is obtained from S^ by a simple rotation of angle 9^, i.e. S fl0 (Rg /i ^). 
Similar estimates are available for the derivatives of 0^. 

In short, a curvelet molecule is a needle whose envelope is supported near a ridge of 
length about 2 -J / 2 and width 2~^ and which displays an oscillatory behavior across the 
ridge. It is easy to show that curvelets as introduced in the previous section are indeed 
curvelet molecules for arbitrary degrees R of regularity. 

2.4 Near Orthogonality of Curvelet Molecules 

Curvelets are not necessarily orthogonal to each other 1 , but in some sense they are almost 
orthogonal. As we show below, the inner product between two molecules and pu' decays 
nearly exponentially as a function of the 'distance' between the subscripts [i and //. 

This notion of distance in phase-space, tailored to curvelet analysis, is to be understood 
as follows. Given a pair of indices fi = (j,k,£),fjf = (j',k',£'), define the dyadic-parabolic 
pseudo-distance 

= 2l J '- J 'l • (l + mm(2 j ,2 j ')d(fi,fi')^j , (2.12) 

where 

d(n,y!) = \9fj, - 9^\ 2 + \xfj. - ay | 2 + \{e^,x^ - ay)|. 

Angle differences like 9^ — 9„> are understood modulo ir. As introduced earlier, is the 
codirection of the first molecule, i.e., e M = (cos 9^, sin 0^). 

The pseudo-distance (j2.12|) is a slight variation on that introduced by Smith [27]. We 
see that oo increases by at most a constant factor every time the distance between the scale, 
angular, and location parameters increases. The extension of the definition of uj to arbitrary 
points (x, £) and is straightforward. Observe that the extra term {(e^x^ — x^/)\ 

induces a non-Euclidean notion of distance between and x^. The following properties 
of u are proved in section [7"T1 (The notation A x B means that C\ < A/B < C2 for some 
constants C%, C2 > 0.) 

It is an open problem whether orthobases of curvelets exist or not. 
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Proposition 2.2. 1. Symmetry: co(fi,fj,') xcj(//,/u). 

2. Triangle inequality: d(fx, fjf) < C ■ (d(fi,fj,") + d((i" , //)) /or some constant C > 0. 
5. Composition: for every integer N > 0, and some positive constant CV 

^. Invariance under Hamiltonian flows: cj(/x,//) X uj(fi v (t), n' u (t)). 

We can now state the almost orthogonality result 

Lemma 2.3. Let {m^)^ and (pu')u' be two families of curvelet molecules with regularity 
R. Then for j, j' > 0, 

|(nV,jy>| <C n -u><ji,ii!)- n . (2.13) 
/or every N < f(R) where f(R) goes to infinity as R goes to infinity. 

Proof. Throughout the proof of (|2.13[) . it will be useful to keep in mind that A < C ■ 
(1 + \B\)~ M for every M < 2M' is equivalent to A < C • (1 + B 2 )~ M for every M < M' . 
Similarly, if A < C ■ (1 + |-Bi|) _A/ and A < C ■ (1 + |B 2 |)~ M for every M < 2M', then 
A < C • (1 + | Si | + \ B2\)~ M for every M < M' . Here and throughout, the constants C may 
vary from expression to expression. 

For notational convenience put A9 = 9^ — 0^ and Ax = — x^> . We abuse notation by 
letting m^ be the molecule a^\D 2 -jRe l _ l x), i.e., m^ is obtained from by translation 
to that it is centered near the origin. Put I^i = {m^Pfji). In the frequency domain, I^i 
is given by 

V = 7^)2 / ™»o(OP^M)e^ A * H dt. 
Put jo to be the minimum of j and j' . The Appendix shows that 

J ^(Or di < C • 2 3 ^/4+3//4 . 2-li-i'l* . (i + y°\A6\ 2 )- N , (2.14) 

where 5u is defined in equation ()2.1H) . Therefore, the frequency localization of the curvelet 
molecules (|2.10(l gives 

/ K*(£)II^(£)K < J\s, (os^(0\ N dc 

< c • 2-1^''^ ■ (l + 2i Q \A6\ 2 )~ N . (2.15) 

This inequality explains the angular decay. A series of integrations by parts will introduce 
the spatial decay, as we now show. 
The partial derivatives of rh^ obey 

1^(01 < C • 2- 3 ^ 4 • 2"^+^) • \S^)\ N . 

Put to be the Laplacian in £. Because p^i is misoriented with respect to e^, simple 
calculations show that 

ia^koi <c-2- 3 ^. 2 -^".|^(e)r, 

\^PA0\ < C ■ 2- 3 ^"/4 . (2-2/ + 2-i'l sin(A#)| 2 ) • 1^(0^. 
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Recall that for t £ [— vr/2, n/2], 2/ir ■ \t\ < |sini| < \t\, so we may just as well replace 
|sin(A#)| by |A#| in the above inequality. Set 

T-T_ OJO A £ f 

s i + 2?o|Aepae?' 

On the one hand, for each fc, L k {rh^p^i) obeys 

|L fe (m M ^)(£)| < C ■ 2-^/4-3.74 . 1^)1* . |5^ /(0 |JV. 
On the other hand 

L fc e -*^K = [1 + 2^|A*| 2 + 1 + 2J0|Ag|2 K^ Ax)| 2 ] fc e- J (^)'«. 
Therefore, a few integrations by parts give 

I Vi < c-2-i^- (i + v°\e, - o (\ + 2^1 a*| 2 + _|!^| v A x)| : 

and then 

I VI < C ■ 2~\^'\ M • ^1 + 2*>(|Afl| 2 + |Ax| 2 ) + 1 - \(e„ Ax)| 2 j . 

One can simplify this expression by noticing that 

, a , , 2 2 ^|(e At ,Ar E )| 2 ^ r , — — 2^\(e„Ax)\ 



-N 



(l + 2 3a \A6\ 2 ) + . -fj > a/1 +9Jo|Afl|2 = = 2 JO |(e„,Ax)|. 

V 1 |; 1 + 2J«|A0| 2 ~V 1 1 ^i + 2io|A0| 2 ^ 

This yields equation (|2.13j) as required. □ 

Remark. Assume that one of the two terms or both terms are coarse scale molecules, 
e.g. pur, then the decay estimate is of the form 

K"Im>2v)I - c • 2_jAr • i 1 + I x m - vl 2 + K^^m - v)l) _7V • 

For instance, if they are both coarse scale molecules, this would give 

\(m^,Pfi')\ < C ■ (l + \Xfj, - ay|) . 

The following result is a different expression for the almost-orthogonality, and will be 
at the heart of the sparsity estimates for FIO's. 

Lemma 2.4. Let (m^u and be two families of curvelet molecules with regularity R. 

Then for each p > p* , 

Here p* — > as R — > oo. In other words, for p > p* , the matrix Iu^i = {{m^^p^))^^ 
acting on sequences (a M ) obeys 

\\Iol\W ^ C p ' \\ a hp- 
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Proof. Put as before jo = mm(j,f). The appendix shows that 



]T (l + ^(d(jJL,fjf))~ Np <C-^'\ (2.16) 



provided that Np > 2. We then have 

1 vr < c • E 2- 2|j - j,|iVp • 2 2 i^"i < q 



provided again that iVp > 2. 

Hence we proved that for p < 1, I is a bounded operator from £ p to £ p . We can of course 
interchange the role of the two molecules and obtain 



su pE K rr v>?v)l p ^ Cp- 



For p = 1, the above expression says that I is a bounded operator from ^ to £oo- By- 
interpolation, we then conclude that / is a bounded operator from t p to t p for every p. □ 

3 Heuristics 

We would like to explain why curvelets are special and why we may expect sparsity by 
considering the situation of the classical acoustic wave propagation equation IJ1.4JI 

d 2 u o, \ a 
-Qp = c ( x ) Au ' 

with initial conditions u(x,0) = uq(x), ^|(x,0) = u\(x). To picture the action of the 
solution operator on a curvelet, consider the case where the velocity is constant, e.g. c = 1. 
Put t) to be the spatial Fourier transform of the solution at time t (and similarly for 
uq,u\). In the Fourier domain, the PDE (|l,4j) is then transformed into an ODE whose 
solution is given by the relation 

u{i,t) = cos(|£|t)«o(0 + (3-1) 

To understand the sparsity we need only to consider the sparsity of the multiplication by 
cos(|£|t), as the exact same arguments apply to the other term. In addition, since 

cos(|^) = (e^l* + e -*l€l*)/2, 

we may just as well study the sparsity of the the multiplication by e*^l*. We will now 
explain how the parabolic scaling plays an essential role. 

3.1 Frequency localization 

Because the frequency support of a curvelet (p^ is effectively a thin dyadic rectangle of 
sidelengths about 2 J in the direction e p and about 2- ? / 2 in the orthogonal direction, the 
modulus of |£| is close to the component of £ along e M , and we may want to linearize the 
phase as follows 
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The point is that the "phase perturbation" <5(£) = |£| — £ • is bounded over rectangles 
which support With these notations, the phase multiplication takes the form 

e i|€|t ^(0 = e W(0 *(e i «- e ^ M (0); 

now the multiplier e* a ^* is not really oscillatory (at least for £ not too large) thanks to 
<5(£) = O(l). Therefore, the phase multiplication amounts to a linear modulation followed 
by a multiplication with a smooth term. The frequency picture is given in figure |IJ In the 
spatial domain, this simply corresponds to a shift by t along the co-direction of the curvelet 
followed by a convolution with a gentle kernel. 

As it seems reasonable to assume that curvelets sparsify both translations and convolu- 
tions, one might expect sparse representations of the solution operator, at least in the case 
where the coefficients of the differential equations are constant. 

3.2 Spatial localization 

On the one hand, curvelets have enough frequency localization so that they approximately 
behave like waves. But on the other, they have enough spatial localization so that the flow 
will essentially preserve their shape, even when the velocity is varying. 

At high frequencies or equivalently at small scales, a curvelet has an envelope strongly 
aligned along a specified 'ridge' of length 2~^ 2 and width 2~ J and is oscillatory across the 
main 'ridge' with a frequency roughly inversely proportional to its width. Now the key- 
point is that the velocity field does not vary very much over the support of a curvelet. In 
fact, owing to the geometric relation width ~ length 2 , the velocity field cannot significantly 
bend our curvelet. To see this fact, let us model the action of the velocity field as a warping 
g(x). Consider now the warped curvelet ip^(g(x)). Then the warping leaves the curvelet 
almost invariant; i.e. our curvelets approximately remain needles essentially fitting in boxes 
of side-lengths 2~^ 2 and 2~K See figure El This behavior is of course very different from 
that of classical waves of the form e lx '^. Because such waves have elongated support, the 
wavefronts are geometrically distorted. 

3.3 The parabolic scaling is special 

There is something truly remarkable about the parabolic scaling. Instead of curvelets 
obeying the law 

width ~ length 2 

one may want to consider the family of element obeying the more general power-law 

width ~ length , 1 < a < oo. 

However, we have seen that to exhibit the wave-like behavior, one needs enough frequency 
localization. In our setup, we need that in the frequency domain, our waveform lie in a 
box of width t and length L obeying £ 2 /L = 0(1). In the spatial domain, this minimum 
anisotropy imposes the relationship 

width < length 2 . 

But we cannot afford too much anisotropy as otherwise curvelets would be distorted. In the 
spatial domain, this says that curvelets have to be supported near boxes obeying L 2 < t or 
in other words 

width > length 2 . 
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Figure 5: The bottom ellipse 
indicates the essential support 
of (ffj,, initially at (a^,^). Af- 
ter propagation, the essential 
support of </V(i) ^ s war P e d but 
not by much. 



In conclusion, if we want both these properties to hold simultaneously, namely, wave-like 
behavior and particle- like behavior (no essential distortion by applying a smooth warping), 
only one scaling may possibly work: the parabolic scaling. 

4 Representation of Linear Hyperbolic Systems 

We now return to the main theme of this paper and consider linear initial-value problems 
of the form 



where in addition to the properties listed in the introduction, A k and B together with all 
their partial derivatives are uniformly bounded for x £ R n . As explained in section FOl we 
need to make the technical assumption that for every set of real parameters the (real) 
eigenvalues of the matrix have constant multiplicity in x and £. 

Our goal is to construct a concrete 'basis' of L 2 (M n ,C m ) in which the evolution is as 
simple/sparse as possible. We present a solution based on the newly developed curvelets — 
which were introduced in section |2] — and choose to specialize our discussion to n = 2 spatial 
dimensions. The reason is twofold: first, this setting is indeed that in which the exposition 
of curvelets is the most convenient; and second, this is not a restriction as similar results 
would hold in arbitrary dimensions. 

4.1 Main result 

We need to prove 




in 




(4.1) 




(4.2) 



.a 
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for some constant C^jv > growing at most like C^e Kiqt for some Cn,Kn > 0. The 
sum over v" indexes the different flows and takes on as many values as there are distinct 
eigenvalues \®„. 

It is instructive to notice that the estimate ()4.2|) for t = is already the strongest of its 
sort on the off-diagonal decay of the Gram matrix elements for a tight frame of curvelets. 
For t > 0, equation (|4,2j) states that the strong phase-space localization of every curvelet 
is preserved by the hyperbolic system, thus yielding a sparse and well-organized structure 
for the curvelet matrix. These warped and displaced curvelets are 'curvelet molecules' as 
introduced in section |2~31 because, as we will show, they obey the estimates (|2.5|l and (|2.6|) . 

The choice of the curvelet family being complex-valued in the above theorem is not es- 
sential. E(t) acting on real-valued curvelets would yield two molecules per flow (upstream 
and downstream). Keeping track of this fact in subsequent discussions would be unneces- 
sarily heavy. In the real case, it is clear that the structure and the sparsity of the curvelet 
matrix can be recovered by expressing each real curvelet as a superposition of two complex 
curvelets. 

4.2 Architecture of the proof of the main result 

Much of the remainder of this paper is devoted to the justification of Theorem 11.11 or 
equivalently (|4.2j) . However, before engaging in the formal proof, we would like to present 
the overall architecture of the argument. 

• Decoupling into polarized components. The first step is to decouple the wavefield 
u(t,x) into m one-way components f y {t,x) 

m 

u(t,x) = 22R v f„(t,x), 

v=l 

where the R u are operators mapping scalars to m-dimensional vectors, and indepen- 
dent of time. The f u will also be called 'polarized' components. This allows a separate 
study of the m flows corresponding to the m eigenvalues of the matrix X^fcLi ^-fc( x )£fc- 
In the event these eigenvalues are simple, the evolution operator E(t) can be decom- 
posed as 

m 

E{t) = ^ Rue~ itAv L u + negligible, (4.3) 

u=l 

where the L„'s are operators mapping m-dimensional vectors to scalars and the 
Ay's are one-way wave operators acting on scalar functions. In effect, each operator 
E v (t) = e~ lt ^ convects wave- fronts and other singularities along a separate flow. The 
'negligible' contribution is a smoothing operator — not necessarily small. The compo- 
sition operators R u and decomposition operators L v are provably pseudo-differential 
operators, see section H~3l 

• Fourier Integral Operator parametrix. We then approximate for small times t > 
each e _l * A ", v = l,...,m, by an oscillatory integral or Fourier Integral Operator 
(FIO) F v (t). Such operators take the form 

F u (t)f(x) = J e^ x ^a v (t,x,0f^)dC, 

under suitable conditions on the phase function <)?„(£, x, £) and the amplitude a u (t, x, £). 
Again, the identification of the evolution operator E v (t) = e~*' A " with F v is valid up 
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to a smoothing and localized additive remainder. The construction of the so-called 
parametrix F v (t) and its properties are detailed in section 14.41 

Historically JH]) the construction of an oscillatory integral parametrix did not in- 
volve the decoupling into polarized components as a preliminary step. When applied 
directly to the system (|4.1|) , the construction of the parametrix gives rise to a matrix- 
valued amplitude a(t, x, £) where all the couplings are present. This somewhat simpler 
setting, however, is not adequate for our purpose. The reason is that we want to boot- 
strap the construction of a parametrix to large times by composing the small time 
FIO parametrix with itself, F{nt) = [F(t)] n . Without decoupling of the propaga- 
tion modes, each E{t) or F{t) involves convection of singularities along m families of 
characteristics or flows. Applying F(t) again, each flow would artificially split into m 
flows again, yielding m 2 fronts to keep track of. At time T = nt, that would be at 
most m n fronts. This flow-splitting situation is not physical and can be avoided by 
isolating one-way components before constructing the parametrix. The correct large- 
time argument is to consider E u (nt) for small t > and large integer n as [E u (t)] n . 
This expression involves one single flow, indexed by v. 

• Sparsity of Fourier Integral Operators. The core of the proof is found in section [5] and 
consists in showing that very general FIO's F(t), including the parametrices F v {t), 
are sparse and well-structured when represented in tight frames of (scalar) curvelets 
ip^. The scalar analog of Theorem 11.11 for FIO's is Theorem 15.11 — a statement of 
independent interest. Observe that pseudo-differential operators are a special class of 
FIO's and, therefore, are equally sparse in a curvelet frame. 

Section [4.61 assembles key intermediate results and proves Theorem ll.il 
4.3 Decoupling into polarized components 

How to disentangle the vector wavefield into m independent components is perhaps best 
understood in the special case of constant coefficients, Ak(x) = A^, and with B{x) = 0. 
In this case, applying the 2-dimensional Fourier transform on both sides of 1)4.1(1 gives a 
system of ordinary differential equations 



(Note that a(£) is a symmetric matrix with real entries.) It follows from our assumptions 
that one can find m real eigenvalues Aj,(£) and orthonormal eigenvectors ?"„(£), so that 



Put fu(t,£) = r u (£) ■ u(t,^). Then our system of equations is of course equivalent to the 
system of independent scalar equations 



du 

7u 



k 



-^r(t,0 + *MO/(t,0 = o, 



which can then be solved for explicitly; 



e -ftMO/„(0,£). 
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Hence, the diagonalization of a(£) decouples the original equation (|4.1j) into m polarized 
components; these can be interpreted as waves going in definite directions, for example 'up 
and down' or 'outgoing and incoming' depending on the geometry of the problem. This is 
the reason why /„ is also referred to as being a 'one-way' wavefield. 

The situation is more complicated when A^{x) is non- uniform since Fourier techniques 
break down. A useful tool in the variable coefficient setting is the calculus of pseudo- 
differential operators. An operator T is said to be pseudo- differential with symbol a if it 
can be represented as 

T/ (X) = a(x, D)f = — ^ / e****(x, £)/(£) d£, (4.4) 

with the convention that D = — iV. It is of type (1, 0) and order m if a obeys the estimate 

\d£d%a(x,Q\ <C Q ^-(l + |^|r-H 

for every multi-indices a and (3. Unless otherwise stated, all pseudo-differential operators 
in this paper are of type (1, 0). An operator is said to be smoothing of order — oo, or simply 
smoothing if its symbol satisfies the above inequality for every m < 0. Observe that this is 
equivalent to the property that T maps boundedly distributions in the Sobolev space H~ s 
to functions in H s for every s > 0, in addition to a strong localization property of its kernel 
G(x,y) which says that for each N > 0, there is a constant CV > such that G obeys 

\G(x,y)\<C N -(l + \x-y\)- N (4.5) 

as in EH] [Chapter 6]. 
Now set 

m 

a(x, D)=J2 M x ) D k ~ iB(x), 
k=l 

and its principal part 

m 

a°(x,D)=J2A k (x)D k , 

k=l 

so that equation Q4.1j) becomes dtit + ia(x, D)u = 0. The matrices a(x,£) (resp. a°(x,£)) are 
called the symbol of the operator a(x, D) (resp. a°(x, D)). Note that a (x,£) is homogeneous 
of degree one in £; a also goes by the name of dispersion matrix. 

It follows from the symmetry of and B that for every set of real parameters £i , . . . , £ m , 
the matrix a°(x,£) = X]fc^fc( 3; )6c is a l so symmetric and thus admits real eigenvalues 
A^(x,£) and an orthonormal basis of eigenvectors r°(x,£), 

a°(x,0r° v (x,0 = X u (x,0r° v (x,0- (4.6) 

The eigenvalues being real and the set of eigenvectors complete is a hyperbolicity condition 
and ensures that equation (|4.1|) will admit wave-like solutions. We assume throughout this 
paper that the multiplicity of each \®(x, £) is constant in x and £. 

By analogy with the special case of constant coefficients, a first impulse may be to 
introduce the components r®(x, D) ■ u, where r®(x,D) is the operator associated to the 
eigenvector r®(x,£) by the standard rule (|4.4jl . In particular this is how we defined hyper- 
curvelets from curvelets in section 11.41 Unfortunately, this does not perfectly decouple 
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the system into m polarized modes — it only approximately decouples. Instead, we would 
achieve perfect decoupling if we could solve the eigenvalue problem 

a(x, D)r v (x, D) = r u (x, D)X v (x, D). (4.7) 

Here, each A u = X u (x,D) is a scalar operator and R v = r u (x,D) is an m-by-1 vector of 
operators. Equation ()4.7j) must be understood in the sense of composition of operators. 
Now let /„ be the polarized components obeying the scalar equation 

^ + iA v f v = 0, (4.8) 
with initial condition f v (0,x) and consider the superposition 

u = ^2 u u , u u = R v f u . 

v 

Then u is a solution to our initial- value problem (|4.1|) . (We will make this rigorous later, 
and detail the dependence between the initial values uq and the f u (0, -)'s.) 

The following result and its proof show how in some cases, (|4.7|) can be solved up to a 
smoothing remainder of order — oo. When all the eigenvalues \®(x,£) are simple, the exact 
diagonalization is, in fact, possible. The situation is more complicated when some of the 
eigenvalues are degenerate. This complication does not compromise, however, any of our 
results. 

Lemma 4.1. Suppose our hyperbolic system satisfies all the assumptions stated below j4-l]) . 
Then there exists an m-by-m block- diagonal matrix of operators A and two m-by-m matrices 
of operators R and S such that 

a(x, D)R = RA + S, 

where A, R and S are componentwise pseudo-differential with A of order one, R of order 
zero, and S of order — oo. Each block of A corresponds to a distinct eigenvalue A° whose 
size equals the multiplicity of that eigenvalue. The principal symbol of A is diagonal with 
the eigenvalues \®(x,£) as entries. 

The lemma says that polarized components corresponding to distinct eigenvalues can 
be decoupled modulo a smoothing operator. However, further decoupling within the 
eigenspaces is in general not possible. We would like to point out that a similar result 
was already given in |3 1 j although in a different context and with a different proof, and was 
used in [30] . 

Proof. We already argued (|4,7|) is not just the eigenvalue problem for the symbol a(x,£) 
for the composition of two operators does not reduce to a multiplication of their respective 
symbols. Instead, it is common practice |16j to define the twisted product of two symbols 
a and t as 

(o-$t)(x,D)=<t(x,D)t(x,D), 

so that (JHH) becomes the symbol equation a\r v = r u $\ u . Note that D = —iV. The 
explicit formula for the twisted product is, in multi-index notation 2 , 

^=£^^- 

M>o 

2 All the pseudo-differential operators considered in this paper are of type (1,0) therefore all such poly- 
homogeneous expansions are valid. 
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We can see that a (j r is the product ar up to terms that are at least one order lower 
(because of the differentiations in £). 

Recall the decomposition of the symbol a(x, £) into a principal part a°(x, £) = ^(x)^, 
homogeneous of degree one in £, and a remainder homogeneous of order zero. It follows 
that the eigenvalues A° (x, £) of a°(x, £) are homogeneous of degree one, and the correspond- 
ing eigenvectors r®(x, £) may be selected as homogeneous of degree zero (and orthonormal) . 
Up to terms of lower order in £, the original problem (|4,7jl therefore reduces to the eigen- 
value problem a°(x,£)r®(x,£) = r®(x,£)\®(x,£) for the symbol a . It is then natural to 
look for a solution r u , \ u of (|4.7|) as a perturbation of rjj, by lower-order terms. 

Consider first the case in which each eigenvalue A° is simple and define the expansions 



~ r 



+ rl + rl + ..., \ v ~ A° + Ai + A? + . . 



f 1 ' V 



so that r™ is of order — n in £ and A™ of order — n + 1 i.e., 

i^c(x,oi<^(i + iei)-"-H, 

and similarly for A™. We plug these expansions in the twisted product, or equivalently in 
(|4.7|) . and isolate terms of identical degree. 

The contribution at the leading order is, of course, a°r^ = A^r^J and the remainder is of 
the form a ft — j) A^; put as its principal symbol. The zero-order equation reads 

(a°-A°/)ri = -e°+rX (4.9) 

which admits a solution if and only if the right hand side has a zero component in the 
eigenspace spanned by r°. This is possible if A* is selected so that 

-el + rl\l Lrl O Aj = r° • e°. 

It follows that equation ()4.9j) admits the family of solutions 

ri = (a°-A°/)- 1 (- e o+rOAi) + / 1 rO, 

where /i is actually a scalar function of x and £, and homogeneous of degree -1 in £. Our 
proof does not exploit this degree of freedom. 

It is clear that one can successively determine all the AJJ's and r™'s in a similar fashion. 
Let el be the principal symbol of a jj (r° + . . . + r£) - (r° + . . . + r") fl (A° + . . . + A"), then 
the equation at the order — n is 

(a°-AO/K +1 = -e™ + rOA™ +1 , 

and is solved exactly like (|4.9|) . 

Suitable cut-offs of the low frequencies guarantee convergence of the series for r v and A^. 
As is standard in the theory of pseudo-differential operators |281 132j , one selects a sequence 
of C°° cut-off functions x™(0 = x( e ™£) f° r some x vanishing inside a compact neighborhood 
of the origin, and identically equal to one outside a larger neighborhood. Then e is taken 
small enough so that 

oo oo 
n=0 n=0 
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are converging expansions in the topology of C°°. As a result, the remainder s v = a%r v — 
r v (j X u also converges to a valid symbol which, by construction, is of order — oo, i.e. obeys 

\d%dP s „(x,o\<c aAN -(i + \ti\)- N -w 

for every N > 0. The lemma is proved in the case when all eigenvalues of the principal 
symbol are simple. 

Consider now the case of a multiple eigenvalue A , say. Suppose the corresponding 
eigenspace is of dimension p and spanned by r±, . . . , r2. The reasoning for simple eigenvalues 
does not apply because the p solvability conditions are too many for purely diagonal lower- 
order corrections. Instead, the block corresponding to A is now perturbed as 



( x ° ' 


' °^ 


1+ 


An • 




+ 


An • 




+ 




• A y 






• A 1 1 

pp/ 




Ui • 


• A 2 1 

pp/ 





where each A™- is homogeneous of degree — n + 1 in £. At the leading order, The p equations 
relative to Ao are 

(a°-A /)r) = - ej ° + X>% (4.10) 
i=i 

where e!- is the principal symbol of a ft — (J A . Solvability requires that the projection of 
the right-hand side on each of the r®, i = 1, . . . ,p vanishes. This unambiguously determines 
all the components of the p-by-p block A 1 as 

A 1 - = r° • e° 

All blocks relative to other eigenvalues are solved for in a similar way, yielding a block- 
diagonal structure for the zeroth order correction A 1 . Each block should have dimension 
equal to the multiplicity of the corresponding eigenvalue in order to meet the solvability 
requirements. 

The perturbed eigenvectors r\,...,rp are determined as previously once the \}j are 
known. The same reasoning applies at all orders and thereby determines A and R. Con- 
vergence issues are addressed using cut-off windows just as before. □ 

The above construction indeed provides efficient decoupling of the original problem ()4.1|) 
into polarized modes. 

Lemma 4.2. In the setting of Lemma \4-l[ the solution operator E(t) for j4-l\ ) may be 

decomposed for all times t > as 

E(t) = Re- itA L + S(t), 

where the matrices of operators A and R are defined in Lemma \4.1\ and S(t) is (another) 
matrix of smoothing operators of order —oo. In addition, 

1. L is an approximate inverse of R, i.e. RL = I and LR = I (mod smoothing). 

2. L is a pseudo- differential of order zero (componentwise) . 

Observe that e~* <A inherits the block structure from A, and is diagonal in the case where all 
the eigenvalues are simple. 
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Proof. Begin by observing that R = r(x,D) — as an operator acting on L 2 (R 2 ,C m ) — is in- 
vertible modulo a smoothing additive term. This means that one can construct a parametrix 
L so that LR = I and RL = I with both equations holding modulo a smoothing operator. 
To see why this is true, note that the matrix r(x,£) is a lower-order perturbation from 
the unitary matrix r°(x,^) of eigenvectors of the principal symbol a°(x,£). The inverse 
of r°(x,£) is explicitly given by t°(x,0 = r°(x,£)*. The symbol of L can now be built 
as an expansion £° + l l + . . ., where each £ n (x,£) is homogeneous of degree — n in £ and 
chosen to suppress the Od^l - - 7 ) contribution in RL — I as well as LR — I. This construction 
implies that L is pseudo-differential of order zero (componentwise). All of this is routine 
and detailed in |16j [page 117]. An open question is whether L can simply be chosen as the 
adjoint of R. 

In the sequel, S, S\ and S 2 will denote a generic smoothing operator whose value may 
change from line to line. The composition of a pseudo-differential operator and a smoothing 
operator is obviously still smoothing. Set / = Lu and let A = a(x, D), so that dtu = —iAu. 
On the one hand, u = Rf — Su and 

d t u = Rd t f - Sd t u = Rd t f - SAu. (4.11) 

On the other hand, Lemma 14. II gives 

Au = ARf - ASu = RAf + Si/ + S 2 u = RAf + Su (4.12) 

Comparing (|4.11j) and (|4.12j) . and applying L gives 

d t f = -iAf + Su. (4.13) 

This can be solved by Duhamel's formula, 

f(t) = e~ itA f{0) + f e~ i{t ^ K Su{T)dT. (4.14) 
J 

We now argue that the integral term is, indeed, a smoothing operator applied to the initial 
value uq. 

• First, the evolution operator E(t) = e~ ltA has a kernel K(t, x, y) supported inside 
a neighborhood of the diagonal y = x and for each s > 0, is well-known to map 
H s (R 2 ,C m ) boundedly onto itself HHJ. Therefore, SE(t) maps H~ s to H s boundedly 
for every s > and has a well- localized kernel in the sense of (|4.5jl . This implies that 
SE(t) is a smoothing operator. 

• Second, section T4.4I shows that e~** A is, for small t, a FIO of type (1,0) and order 
zero, modulo a smoothing remainder. The composition of a FIO and a smoothing 
operator is smoothing. For larger t, think about e~ ltA as the product (e -i « A ) n for 
appropriately large n. 

• And third, the integral extends over a finite interval [0, t] and may be thought as an 
average of smoothing operator — hence smoothing. 

In short, f(t) = e~ ztA f(0) + Suo- Applying R on both sides of (|4.13|) finally gives 

u = R e - itA Lu + S lUo + S 2 u = (Re~ itA L + S)u 

which is what we set to establish. 

It remains to see that the evolution operator e~ ttA for the polarized components has the 
same block-diagonal structure as A itself. This is gleaned from equation (|4.13|) : evolution 
equations for two components f Ul , f U2 (corresponding to distinct eigenvalues) are completely 
decoupled. □ 



28 



4.4 The Fourier Integral Operator parametrix 

Lemma 14.21 explained how to turn the evolution operator E(t) into the block-diagonal rep- 
resentation e _l<A . In this section, we describe how each of these blocks can be approximated 
by a Fourier Integral Operator. The ideas here are standard and our exposition is essentially 
taken from Jl] and [2S]. The original construction is due to Lax |18| . 

Let us first assume that all eigenvalues of the principal symbol a°(x, £) are simple. This 
is the situation where the matrix of operators A (Lemma 14. 1|) is diagonal with diagonal ele- 
ments K v . Put E v (t) = e _4 * Ay , the (scalar) evolution operator relative to the vth. polarized 
mode. We seek a parametrix F u (t) such that S v (t) = E v {t) — F v {t) is smoothing of order 

— CO. 

Formally, 

f(t,x) = J E u {t)(e ix< ) /o(0 d£. 
Our objective is to build a high-frequency asymptotic expansion for E v (t){e lx "^) of the form 

e^(^)^(t,x,0, (4.15) 

where a v ~ + a\, + . . . and cr" is homogeneous of degree — n in £. As usual, one obtains 
converging expansions through the use of appropriate low-frequency cutoffs as in the proof 
of Lemma 14.11 

As is classical in asymptotic analysis, we proceed by applying M u = dt + iA u to the 
expansion 1)4.15)1 and successively equate all the coefficients of the negative powers of |£| 
to zero, hence mimicking the relation M v E u {t){e lx '^) = which holds by definition. For 
obvious reasons, we also impose that 1)4.15)1 evaluated at t = be e lx '^. 

We assume that $> v together with its partial derivatives in t and x be homogeneous 
of degree one in £ (note that & v (0, = x ■ £). Next, A u is also expressed as a polyho- 
mogeneous expansion A v ~ X^j>o M>(x f D), where each symbol \i(x,£) is homogeneous of 
degree — j + 1 in £, compare with the proof of Lemma 14.11 In passing, note that A^(x, £) is 
the i^th eigenvalue of the principal symbol a°(x,£), hence our consistent use of notations. 

On the one hand, 

+ (.16) 

On the other hand, the action of A„ on the oscillatory product e l ^"o u is more complicated 
but was studied in ^J|2S]. The following formula is available: 



K[e l ^a u ] ~Y,^ X »( x > V **"fo x 'W D fi^ (4.17) 



-i<S> 

/ 



where 

q(t, x, y, = y, - x, f ) - V x § v (t, x, £) • (y - x). 

Both equations (|4. 16|) and 1)4.1 7J) are poly homogeneous expansions. It is rather tedious to 
show that this property indeed holds for ()4.17)) and the discussion can be found in |28| . 
Consider now the leading order relation (coefficient of |£|), 

i[^ + X° u (x,V x ^)]a° u = 0. 
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For nonzero cj^, the term within brackets must vanish. For each v, we are thus in the 



presence of a Hamilton-Jacobi equation 

d$> 



dt 



+ A°(x,V a $ I/ ) = 0, (4.18) 



which are the well-known equations for the characteristic surfaces of the original problem 
(|4.1|) . As observed earlier, the initial value is chosen as <Ev(0,a;,£) = x ■ £. 

Local existence and uniqueness for (|4.18f) with the given initial condition is ensured as 
soon as a°(cc,£) is C 2 , from the knowledge that is constant along the bicharacteristics. 
See also [2Bj- The AjJ's inherit the boundedness and smoothness properties of a°(x,£). 
The solution to ((4,18(1 is not expected to be global in time, because <!>„ would become 
multi-valued when rays originating from the same point xq cross again later. This typically 
happens at cusp points, when caustics start developing. We refer the reader to [Til 13*3*] . 

Because the equation 1(4.18(1 is homogeneous of degree one in $> v , the degree of homo- 
geneity of Q v in £ must match that of the initial condition at all times. This validates our 
assumption that the phase is of degree one. 

Consider the relation at the next order (coefficient of |£|°), 

f)$> drr° 

i[~Of + V 4 *„)]ai + ~of+ V « A °(^ VA) • V x a° v = -i\l(x, V^K- 

The first term vanishes from 1(4.18(1 . What remains is a transport equation for <r^ along the 
bicharacteristic vector field dt + V^A°(x, V^^) • V x . This determines cr° uniquely from its 
initial value cr°(0, x, £) = 1. It is a smooth function of x and £, homogeneous of degree zero 
in £. 

The other cr™ are determined successively in a very similar fashion. Each relation corre- 
sponding to a negative power — n of |£| is a transport equation along the same vector field: 

drr n 

+ V 5 A°(x, V^,) • V x a: = P n {al . . . , a"), (4.19) 

where P n is a known differential operator applied to <r°, . . . ,<t™. Because of ((4.18(1 . the 
unknown cr" +1 will always disappear from the relation at the degree —n. The initial con- 
dition is cr"(0, x, £) = for n > 0. Since by construction, the right-hand side of ((4.19(1 is 
homogeneous of degree —n in £, the same holds for o™. The smoothness of each er™ in x 
and £ is inherited from that of the principal symbol a°(cc,£). 

In the case where some eigenvalue A^ has multiplicity p > 1, the construction of a FIO 
parametrix roughly goes the same way. Let us use the same notation A v to denote the p- 
by-p block corresponding to A^ in the matrix A of Lemma l4.ll The corresponding one-way 
evolution operator is still denoted by E v (t) = e~ ttA " . We seek a p-by-p matrix of operators 
F v (t) approximating E u (t). The correct asymptotic expansion for E u (t)(e lx '^) is now 

oo 
n=0 

where <& u is a single scalar phase function and each <x™ is a p-by-p matrix, homogeneous 
of degree —n in £. Again, we apply M v = Idt + iA u to this expansion and successively 
equate identical powers of |£| to zero. The leading-order relation is satisfied if and only if 
$> v solves 1(4.18(1 with initial condition $^(0, x,£) = x-£. The next equation, corresponding 
to is a system of p scalar evolutions equations which determines each column of er^ 
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uniquely from the initial condition cr^(0, x, £) = /. This situation repeats itself at all orders 
and completely determines the expansion. 

In all cases the parametrix F v thereby defined takes the form of a FIO 

FAt)fo(x) = J e^^W(i,x,£)/ (O^, (4.20) 

with (Ev the phase function, homogeneous of degree one in £, and a v the amplitude, poly- 
homogeneous of degree in £. The amplitude is scalar- valued or matrix- valued depending 
on the multiplicity of A°. A suitable choice of cutoff functions in £ will remove the singular 
behavior of each |£| -Tl at the origin and guarantees convergence of the series Q4.15|) . This 
technicality, however, is the reason why F v (t) is not exactly equal to E v (t) but is only an 
approximation . 

We emphasized that & u may be defined only for small times. Put 2t* the infimum time 
for which a solution to ()4.18[) ceases to exist, uniformly in v. In order to establish results for 
large times, we will simply compose evolution operators; e.g. E u (nt*) = E v (t*) . . . E v (t*). 

Lemma 4.3. Define t* as above. In the setting of lemma \4~l\ denote by K v a block of A 
and E v {t) = e _4 * A ". Then for every < t < t* , there exists a parametrix F v {t) for the 
evolution problem dtf + iA v f = 0, that takes the form of a Fourier Integral Operator. For 
every such t the phase function Q v is positive-homogeneous of degree one in £ and smooth 
in x and £; the amplitude a v is a symbol of type (1,0) and order zero. The remainder 
S u (t) = E u (t) — F u (t) is a smoothing operator of order — oo. 

Proof. The proof is a slight variation on the argument presented in |28j [Pages 120 and 
below] and is omitted. □ 

4.5 Sparsity of smoothing terms 

The specialist will immediately recognize that a smoothing operator of order — oo is very 
sparse in a curvelet frame. This is the content of the following lemma. 

Lemma 4.4. The curvelet entries of a smoothing operator S obey the following estimate: 
for each N > 0, there is a constant Cm such that 

|<^,Syy)| < C N ■ 2-W\ N (l + \x„ - x^\)- N . (4.21) 

Note that (j4"2T|) is a stronger estimate than that of Theorem ll.il Indeed, our lemma 
implies that 

\{(Pfj,, S(pfj,')\ <C N -uj(n,h u (t,ti'))~ N 
which is valid for each N > and regardless of the value of v. 

Proof. We know that S maps H~ s to H s for arbitrary large s, so does its adjoint S* . As a 
result. 

< \\s*<pJhs \\(Pfjf ||^_ s || wll^-. ||£Vy Whs 
<C-||<^'IIh-»IMh-» <c-2-(^"K 
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On the other hand remember that curvelets have an essential spatial support of size at most 
O(l) x O(l), which is the case when j is small (wavelet regime j = 0). The action of S is 
local on that range of distances, so that 

K^svy)! < c n • (i + - vl) _7V - 

for arbitrary large TV > 0. These two bounds can be combined to conclude that the matrix 
elements of S are negligible in the sense defined above. □ 



4.6 Proof of Theorem 11.11 

• Let us first show how the first assertion on the near-exponential decay of the curvelet 
matrix elements follows immediately from the second one, equation p.!7|) . Let a be 
either a row or a column of the curvelet matrix and let |a|( n ) be the n-largest entry 
of the sequence \a\. We have 

i \ l / M ^ ii n 
n-\a\ (n) <\\a\\t 1/M 

and, therefore, it is sufficient to prove that the matrix E has rows and columns 
bounded in l p for every p > 0. Consider the columns. We need to establish 

sup^|£(i;//,z,;//,i/)| p <C t , p , (4.22) 

for some constant Ct iP > growing at most like C p e Kpt for some C p , K p > 0. 

The sum over v and the sup over v' do not come in the way since these subscripts take 
on a finite number of values. The fine decoupling between the m one-way components, 
crucial for equation (|1.17|) . does not play any role here. 

Let us now show that there exists N, possibly very large, so that 

uniformly in /i'. For the sum in k and I we can use the bound (|7.3|) with Np in place 
of TV, provided Np obeys Np > 2. What remains is 

i>o 

which is bounded by a constant depending on TV and p provided again that Np > 2. 

Hence we proved the property for the columns. The same holds for the rows because 
the same conclusion is true for the adjoint E(t)*; indeed, the adjoint solves the back- 
ward initial-value problem for the adjoint equation ut = A*u, and A* satisfies the 
same hyperbolicity conditions as A. We can therefore interchange the role of the two 
curvelets and obtain 

sup^|£(t;//,i/;^)| p < C t>p . 

Note that the classical interpolation inequality shows that E(t) is a bounded operator 
from £ p to t p for every < p < oo. 
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We now turn to 1)1.17}) . Let us assume first that all eigenvalues A^ of the princi- 
pal symbol a are simple. According to Lemmas 14.21 and 14.31 each matrix element 
E(t; u\ u') = e v ■ E(t)e u > of E(t) can for fixed (possibly large) time t > be written 
as 

m 

E(t;u;v') = R v y'{e~^ K ") n L v t iy + S v y(t). (4.23) 

v"=l 

We have taken n large enough — proportional to t — so that e _J « A,/ is a Fourier Integral 
Operator (mod smoothing) for every v. Each R u y and L v y is pseudo-differential of 
order zero and S u y(t) is smoothing. 

Thanks to Lemma l4.4| it is sufficient to establish the theorem about the the first term 
of (|4.23|) . To do this, we invoke Theorem 15.11 from section |S] about the sparsity of 
FIO's in a curvelet tight-frame. 

It is an interesting exercise to notice that the ray dynamics is equivalently expressed 
in terms of Hamiltonian flows, 



A(i) =V^X°(x{t),C(t)), x(0)=x o , 

or canonical transformations generated by the phase functions & L 

x = V s $(t,x(t),f ), 
£(t) = V x $(t,x(t),6), 



(4.24) 



(4.25) 



provided x,£) satisfies the Hamilton-Jacobi equation §j| + X°(x, Vj;$) = with 
the initial condition <I>(0,x,£o) = x ' (,o- We obviously need this property to ensure 
that the geometry of FIO's is the same as that of hyperbolic equations. 

Pseudo-differential operators are a special instance of Fourier integral operators so 
the theorem equally applies to them. For E(t; fi, u; fi' , u') = ((p^, E(t; u; v')^p'^) we get 



\E(t;n,v,fj,',v')\ <C N ^2 ^■■■^u(fj,,fx ) Jv 'uj(fj, , Hiv"{-)) 



n 

u" = l fJ.0 /In 



w(//n-l>Mn I /"(-)) N u{Hn,n') N , 



n 



for all A > 0. Inequality Q4.2|) then follows from repeated applications of properties 
3 and 4 of the distance oj, see proposition 12.21 The power growth in t of the overall 
multiplicative constant comes from the number of intermediate sums over fj,Q, . . . , \i n . 
There are n + 1 ~ t such sums and they each introduce the same multiplicative 
constant Cat. 

The reasoning is the same when at least some eigenvalues are degenerate. The 
subscript v" now denotes the flows i.e., the eigenvalues A[J„ not counting their multi- 
plicity. Each R u y is a row vector, e~ % ~^ v " a matrix and L v n v > a column vector. The 
FIO parametrix for e~ l ™ A "" was constructed in such a way that only one flow h v n 
appears in the majoration of its curvelet elements (componentwise). There is no in- 
termediate sum over uq, . . . , u n and that's the whole point of decoupling the polarized 
components before constructing the FIO parametrix. 
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4.7 Relation to hyper-curvelets 



In section ITU we introduced hyper-curvelets as 'polarized' curvelets which would not split 
into m molecules along the m different flows. In light of section 14.31 it is interesting to 
reformulate our main result (|4.2|) in terms of hyper-curvelets. We recall that 

<p$=r° v (x,D)cp„(x) = j^yj e^<r° u (x,OMOdt 

Corollary 4.5. Define E^°\t; /x, v\ //, u') = (ipffl , E(t)(p^} u ,} . Then under the same as- 
sumptions as that of theorem li.il we have for all N > 

\E^(t;fx,u^',u')\ < CtN-i^^it))-" + 2-1' ufoM)T N Y (4-26) 

The main contribution to the right-hand side of this inequality is the due to the z/th 
flow. All other flows are weighted by the small factor 2~i = 0(|£| -1 ) on the support of ip^. 
In other words, the leaking of energy from one component v to all others when following 
hyper-curvelets (pffi is smoothing of order —1, hence small at small scales. 

Proof. Equation (|4.26|) follows from Theorem 1 1 . 1 1 and the fact that the adjoint of the matrix 
operator R° whose columns are the R® = r®(x,D) is an approximate left inverse for R° — 
up to an error smoothing of order —1. Indeed, by the standard rules for composition and 
computation of the adjoint of pseudo-differential operators, 

(R°yR v , = ((r u y%r u ,)(x,D) 

= {{r u fr° u ,){x,D) + order-1 
= S UV 'I + order — 1. 

We have used the fact that the dispersion matrix a°(x, £) is assumed to be symmetric, hence 
admits an orthobasis of eigenvectors r®(x,£). We then conclude from Theorem 15 . 1 1 applied 
to pseudo-differential operators of order —1. □ 

Alternatively, we could have defined hyper-curvelets as 

tpff = r u (x,D)^(x) = J e ta <r v {x,OMO^. 

This would have given the same result. 3 The reason why we did not use hyper-curvelets in 
the preceding sections is that they do not necessarily constitute a suitable practical basis 
to decompose wavefields onto. We do not even know if they always constitute a frame. 
Digital implementation would also seem less obvious. 

5 Representation of FIOs 

The purpose of this section is to show that Fourier Integral Operators admit a sparse and 
well-organized structure in a curvelet frame. The main result, Theorem 15.11 is a key step 
in completing the discussion of the previous section. (Observe that by construction, the 
FIO's encountered in the previous section satisfy all the assumptions stated in section fTpl 
right below (11.1911 .) As in the previous section, we will restrict the discussion to x £ M 2 
which is no loss of generality, see section El 

3 We can only conjecture that the decoupling should be better if we use the improved (pj^ . 
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5.1 Main results 



In the introduction section, we detailed a notion of Hamiltonian correspondence for hyper- 
bolic equations. This correspondence also exists for FIO's and is 'encoded' in the phase 
function $ of the FIO. It is called the canonical transformation associated to and is 
defined as the mapping (x,£) — ► (y,rj) of phase-space 

x = V^(y,0, r] = V x $(y,0- (5-1) 

As suggested in section 14 .6( this formulation is equivalent to that involving trajectories 
along the bicharacteristic flow as in equation (jl.tjf) . provided the phase function solves an 
appropriate Hamilton-Jacobi equation. This canonical transformation induces a mapping 
of curvelet subscripts, denoted by y! = h(fi). 

The main result for this section reads as follows. 

Theorem 5.1. Let T be a Fourier Integral Operator of order m acting on functions ofM 2 , 
with the assumptions stated above, and T(/i; //) denote its matrix elements in the complex 
curvelet tight frame. Then with h the curvelet index mapping and u> the distance defined in 
the elements T(fi; fi') obey for each N > 

\T{p^')\<C N -2^' U (jjL,h^'))~ N , 
for some CV > 0. Moreover, for every < p < oo, (T(fi, fx')) is bounded from £ p to £ p . 

The interpretation of Theorem 15.11 is in strong analogy with that of Theorem 11.11 
Namely, a FIO has the property of transporting and warping a curvelet into another 
curvelet-like molecule. (Again, the choice of using complex- valued curvelets is not essential, 
as a real curvelet would be mapped onto two molecules.) 

The proof of Theorem relies on the factorization of T on the space-frequency support 
of ipp as a nice pseudo-local operator followed by a smooth change of variables, or 
warping Tjj This decomposition goes as follows. 

Let be a fixed curvelet centered around the lattice point (x^, £ M ) in phase-space. The 
phase of our FIO can be decomposed as 

$(x,0 = <f>s(x,^)-Z + 5(x,0, ^(x) = ^(x,^). (5.2) 

In effect, the above decomposition 'linearizes' the frequency variable and is classical, see 
|24l I29j . With these notations, we may rewrite the action of T on a curvelet ip^ as 

(T^)(x) = J e^<*He*^a(x,0M0d£- (5-3) 
Now for a fixed value of the parameter /i, we introduce the decomposition 

where 

(T 1>fM f)(x) = J e ix %(x,0f(0dt (T 2 ,Mx) = f(M x ))i (5-4) 

with bfj,(x,^) = e* 5 ^' 1 ^^a{<j)~ l {x), £)). This decomposition allows the separate study of 
the nonlinearities in frequency £ and space x in the phase function The point is that 
both T\ fj, and are sparse in a curvelet tight frame — only for very different reasons. 
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Theorem 5.2. Let (y?^)^ be a tight frame of curvelets compactly supported in frequency. 
For each p, Ti „ maps ip^ into a curvelet molecule with arbitrary regularity R, uniformly 
over p in the sense that the constants in estimates \2. 5\) and \2.b)) do not depend on p. 

As we shall see, the proof of Theorem l5.21 presented in section 1531 relies on the property 
of compact support in frequency of the In contrast the corresponding result for the 
operators T%„ which we present next, is extraordinarily simplified if one uses curvelets 
compactly supported in space. Although well localized in space, the tight frame introduced 
in section |21 does not meet this requirement. In order to circumvent this technical difficulty, 
we introduce compactly supported curvelet atoms in section 15.31 They are built on the 
model of atomic decompositions, standard in approximation theory |17j . 

Theorem 5.3. Let {p^)^ be a family of complex-valued curvelet atoms, compactly supported 
in space, with regularity R. Denote by h the canonical index correspondence associated to 
<I>, as defined above. For each p, Ti,^ maps p^ into a molecule m^-j of the same regularity 
R, uniformly over p. 

The latter theorem says that the 'warped' atom p^ o 0„ is still an atom, only its scale, 
orientation, and location may have been changed. That a smooth warping preserves the 
sparsity of curvelet expansions is a result of independent interest. 

The remaining 3 sections are devoted to the proofs of Theorems 15.21 15.31 and 15.11 The 
dependence of (j)^ upon p is not essential in proving Theorems 15.21 1531 as the only property 
of interest is that the derivatives of 4>^ are bounded from above and below uniformly over p 
(which follows from our assumptions about $). This is the reason why in the next sections 
we will drop the explicit dependence on p and work with a generic warping <j). 

5.2 Proof of Theorem IQl 

We will assume without loss of generality that our curvelet ip^ is centered near zero (k = 0) 
and is nearly vertical (6^ = 0). 

Set = Tup^. We first show that obeys the smoothness and spatial localization 
estimate of a molecule 1)2. 5JI . With the same notations as before, recall that is given by 

m,{x) = Je ix -%(x,0M0^, b^x, £) = e^^a^ \x), £)• (5.1) 
To study the spatial decay of m^x), we introduce the differential operator 

and evaluate the integral lj5,lj) using an integration by parts argument. First, observe that 

/ \ N 

L?e ix < = { 1 + 12'xil 2 + \^l 2 x 2 f\ e ix <. 



^ e * = ll + \i J xi\ + \l J > X2\ 

Second, we claim that for every integer A > 0, 

\Lf[b tl (x,O0^)]\<C-2-^. (5.2) 

(The factor 2 -3j / 4 comes from the L 2 normalization of (pa.) This inequality is proved in 
appendix 17.21 Hence, 

m p {x) = (l + |2>'s 1 | 2 + \Vl 2 x 2 \ 2 )~ N J Lf [6 M (x,0^(0] 
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Since \L^ [bp(x, £)<p,j,(€)]\ < C ■ 2 3j//4 and is supported on a dyadic rectangle R^, of length 
about 1? and width 2^ 2 , we then established that 

23j/4 

~ (l + \2?xi\* + \Zi/*X2\*) N 
The derivatives of are essentially treated in the same way. Begin with 

0+ip<a 
/3+</p<o 

Therefore, the partial derivatives of are given by 

(^m M )(x)= £ W*), (5.3) 

f3+ip<a 

where 

W*) = J e^ib^O^MO^. (5-4) 

First, observe that on the support of (p^, obeys < -2^ 2 ^ 2 . Second, the term 

dxb(x,^) is of the same nature as b^(x,^) in the sense that it obeys all the same estimates 
as before. In particular, we claim that for every integer N > 0, 

|Lf [d^(x,O^0„(O)\ < C ■ 2~ 3j/4 ■ 2 jft • 2^ 2 . (5.5) 
Hence, the same argument as before gives 

93J/4 . oiA . 2^2/ 2 

\IbJx)\ <C w . 

~ (1 + 12^x12 +\V/2 X2 \2) N 

Now since j3 < a, we may conclude that 

2 3 J'/ 4 . 2 jai • 2 jfa2/ ' 2 



W x m,){x)\ <C 



(l + \2>xi\ 2 + \2i/ 2 x 2 \ 



This establishes the smoothness and localization property. 

The above analysis shows that is a "ridge" of effective length 2~ J / 2 and width 2~ J ; 
to prove that is a molecule, we now need to evidence its oscillatory behavior across 
the ridge. In other words, we are interested in the size of the Fourier transform at low 
frequencies (j23)) - (j23|) . 

Formally, the Fourier transform of is given by 

™m(0 =11 e^SX&ri^Wdxdr}. (5.6) 



We should point out that because the amplitude b is not of compact support in x, the 
sense in which (|5.6j) holds is not obvious. This is a well-known phenomenon in Fourier 
analysis and a classical technique to circumvent such difficulties would be to multiply 
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(or equivalently &«) by a smooth and compactly supported cut-off function x( ex ) an d let e 
tend to zero. We omit those details as they are standard. 

Set Di = —i-J^- To develop bounds on Im^COl) observe that 

An integration by parts then gives 

m„(0 = / / e^D? (eT^b^o) ^^(r,) dxd V . 

Hence, 

N 
m=0 

where 

Note that F m is exactly of the same form as (|5.4[) — but with rf[ n instead of rf — and 
therefore, the exact same argument as before gives 

93j/4 2~i n 

\ F ™(X)\ < C ■ - - - ' — 



We then established 
which gives 



[l + |2J'xi| 2 + |2-?/ 2 ar 2 | 2 ) 

K(oi<c-2- 3 ^. 2 -^.(i+ier), 



as required. This finishes the proof of Theorem 15.21 

The careful reader will object that we did not study the case of coarse scale curvelets; it 
is obvious that coarse scale elements are mapped into coarse scale molecules and, here, the 
argument would not require the deployment of the sophisticated tools we exposed above. 
We omit the proof. 

5.3 Atomic decompositions 

As we will see later, to prove our main result and especially Theorem 15.31 it would be most 
helpful to work with tight frames of curvelet compactly supported in space. Unfortunately, 
it is unclear at this point how to construct such tight frames with nice frequency localization 
properties. However, there exist useful atomic decompositions with compactly supported 
curveletdike atoms. We now explore such decompositions. 

In this section, the notation f a g refers to the function obtained from / after applying 
a parabolic scaling and a rotation 

3/4 tin n = ( l / a 



fa,e{x) = a 1 f {DaRex) , D a 



l/y/a 



and where Rq is the rotation matrix which maps the vector (1, 0) into (cos 6, — sin0). Note 
that this is an isometry as 

||/a,6»||L 2 = ||/||l 2 - 
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In Smith proved the following result: let ip be a Schwartz function obeying tp(l, 0) 7^ 
0; then one can find another Schwartz function tp, and a function (/(£) such that the following 
formula holds 

q(0 [ ia,e(OMOadad0 = r(O; (5.7) 

Ja<l 

here r is a smooth cut-off function obeying 




lfl>2 
If I <1 



and g is a standard Fourier multiplier of order zero; that is, for each multiindex a, there 
exists a constant C a such that 

<c a (i + \t\r H - 

This formula is useful because it allows us to express any object whose Fourier transform 
vanishes on {|£| < 2} as a continuous superposition of curvelet-like elements. We now make 
some specific choices for (p. In the remainder of this section, we will take ip{x) = tp(—x) 
and the function ijj of the form 

ip(x 1 ,x 2 ) = ip D (xi) (p(x 2 ), (5.8) 

where both tp and ip D are compactly supported and obey 

Supp if C [0, 1] , Supp ip D C [0, 1] . 

We will assume that ip and ip D are C°° and that the function i[) D has vanishing moments 
up to order D, i.e. 

J V D (^i)xidxi = 0, jfe = 0,l,...,D. (5.9) 
For each a < 1, each b £ M 2 and each 6 £ [0, 2ir), introduce 

1>aM x ) : = Mx -b) = o" 3 /V (D a R e (x - b)) ; (5.10) 
and given an object /, define coefficients by 



K(f)(a,b,e) = J iP aAb (x)f(x)dx. (5.11) 

Now, suppose for instance that / vanishes over |£| < 2, then (|5.7|) gives the exact recon- 
struction formula 

f(x)= [ K(q(D)f))(a,b,9)ip aAb (xMdadedb), (5.12) 

Ja>l 

with fj>(dad9db) = adadOdb. In the remainder of this section, we will use the shorter notation 
dfj, for fj,(dad9db). 

As is now well-established, the reproducing formula may be turned into a so-called 
'atomic decomposition'. Not surprisingly, our atomic decomposition will just mimic the 
discretization of the curvelet frame as introduced in section |2 With the notations of that 
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section, we introduce the cells defined as follows: for j > 0, I = 0, 1, . . . 2 L-? / 2 J — 1 and 
k = (fci, k 2 ) £ Z 2 , the cell is the collections of triples (a,6,b) for which 

2^' +1 ) < fl <2-J', |0-0 M |< ^2~Li/2J 

and 

D 2 -iReb e [h, ki + 1) x [A&, fc 2 + 1). 

Note that Jg (f/i, = 37r/2 for j even, and 37r for j odd. We may then break the integral 
jSHZj) into a sum of terms arising from different cells, namely, 

f{x) = ^a^p^{x) (5.13) 

where 

a M = |IWD)/)llL 2 (Q M ), P^x) = —f TZ( q (D)f))(a,b,6)i; aAb (x)dfi. (5.14) 

Of course, the decomposition ()5.13j) greatly resembles the tight frame expansion, com- 
pare p.5|) . In particular, the atoms p^ are curvelet-like in the sense that they share all 
the properties of the tight frame (v^)/i - only they are compactly supported in space. In 
the remainder of the paper, we will call these elements atoms. Below are some crucial 
properties of these atoms. 

Lemma 5.1. Rewrite the atoms p^ as p^(x) = 2 3j / 4 a^ (D 2 -j Rq^x — k). In other words, 
p^ is obtained from after parabolic scaling, rotation, and translation. For all p, the 
a W > s verify the following properties. 

• Compact support; 

Supp aW C cQ. (5.15) 

• Nearly vanishing moment along the horizontal axis; let m = D/2. Then for each 
k = 0,1,... ,m, there is a constant C m such that 

J a^\x ll x 2 )x k 1 dx 1 < C m -2~^ rn+l \ (5.16) 

• Regularity; for every multiindex a 

\8%aM{x)\<C a . (5.17) 

In $5.16)) and \5.11\) , the constants may be chosen independently of p and f. 

Proof. See appendix !7.2l □ 

Needless to say that curvelet atoms are molecules with spatial compact support, compare 
lemma l5~Tl with the definition of a molecule. Finally, observe (and this is important) that 
it is of course possible to decompose a molecule into a series of atoms 

A*' 
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The coefficients would then obey the same estimate as in lemma |2~31 

\ot^'\ < C N ■ uj(p, p')~ N , (5.18) 

and in particular, for each p > 0, 

swp^2\a ml \ p < C p . 

This is briefly justified in appendix 17.21 
5.4 Proof of Theorem 15.31 

As mentioned earlier, curvelet atoms depend in a nonessential way upon the object / we 
wish to analyze and we shall drop this dependence in our notations. To prove Theorem 15. 31 
recall that we need to show that for each curvelet atom p^ with regularity R, the 'warped' 
atom pn o <j) is also a curvelet atom, with the same regularity. 
As in Section 15.31 we suppose our curvelet atom is of the form 

/)(1 (x) = W(D 2 - J %( I - Ifl )) ) 

where a^> obeys the conditions of Lemma 15.11 (Here, the location may be formally 
defined by x M = (D 2 -jRe^y l k s .) Define y^ and A^ by 

V? = <F 1 M and A M =(V$(y M ) (5.19) 

so that 

4>{y) = + A u (y - y M ) + g(y - y u ). 

With these notations, it is clear that the warped atom p^oip will be centered near the point 
y^; that is, 

p^{y)) = 2 3 ^ 4 fl M {D^Rg^y - + g(y - y,))) . 

To simplify matters, we first assume that A^ is the identity and show that p M o(/> is a curvelet 
atom with the same scale and orientation as p At . Later, we will see that in general, p^ o cf> 
is an atom whose orientation depends upon A^, and whose scale may be taken to be the 
same as that of p u . Assume without loss of generality that 0^ = and y fl = (statements 
for arbitrary orientations and locations are obtained in an obvious fashion) so that 

p^{y)) = 2 3 ^ 4 aM (D 2 -i(l/ + 9{y))) = 2 3 ^b^ (D 2 - jy ), (5.20) 

with 

b M (y) = a (p) ( y + D 2 - jg (D 2J y)). 

The atom oM is supported over a square of sidelength about 1; likewise, is also com- 
pactly supported in a box of roughly the same size — uniformly over p. We then need to 
derive smoothness estimates and show that b^> obeys 

\d a b^\y)\ < C a , \a\ < R. (5.21) 

Over the support of p u o <fi, g = (51,52) deviates little from zero and for each k = 1, 2, g k 
obeys 

\9k(y)\ < C ■ 2~\ \d a g k (y)\ < C ■ 2~^ 2 , \a\ = 1. 
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Similarly, for each a, \a\ > 1, 

\d a g k (y)\ < C a . (5.22) 

These estimates hold uniformly over p. It follows that for 1 2/1 1 , 1 7/2 1 < C and each a, the 
perturbation g obeys 

V ■ \d a gi (2-3 yi ,2-i/ 2 y 2 )\ < C a , V/ 2 ■ \d a g 2 (2^ yi ^/ 2 y 2 )\ < C a . (5.23) 

The bound 1)5.21(1 is then a simple consequence of 1)5.23(1 together with the fact that all the 
derivatives of aS 1 *' up to order R are bounded, uniformly over p. 

We now show that p^ o (ft exhibits the appropriate behavior at low frequencies. 

/W(6= / e- ix -t Ptl (<f>(x))dx 



We will use the nearly vanishing moment property of p^. Set 

Sz(x) = e-^'W-f/l det V^K^" 1 ^)); 

note that over the support of p^ and for each N < R, we have available the following upper 
bound on the partial derivative of 

\d»St(x)\<C N .(l + \Z\) N . 

Classical arguments give 

= E / 9 " S ^ ,X2) dx 2 J Pli ( Xl , x 2 )x\ d Xl dx 2 + E, (5.24) 

where E is a remainder term obeying 

\E\ < C n ■ 2- 3j / 4 • 2^ n • sup \d^S^x)\ < C n ■ 2~ 3 ^ 4 . 2~ jn (l + \t\ n ). (5.25) 

The near-vanishing moment property gives that each term in the right-hand side of 1)5,24(1 
obeys the estimate in (|5.25|) . This proves that the Fourier transform of p^ o <f> obeys 

\p^m <c n - 2-^(1 +ien 

as required. 

We now discuss the case where the matrix is not the identity. In this case, (15.20(1 
becomes 

with 

mil (y)=2 3 ^ 4 a^(D 2J (y+~g(y))), and ~g(y) = g(A^y). 

Our assumptions about FIOs guarantee that \A~ l \ is uniformly bounded and, therefore, 
it follows from the previous analysis that is a curvelet atom. As a consequence Pil o 
4> is a curvelet atom with the same regularity R since it is clear that bounded linear 
transformations of the plane map curvelet atoms into curvelet atoms. 
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5.5 Proof of Theorem 15.11 



Let (p^ be a fixed curvelet and decompose T as ?2 )At0 o T 1|W) . First, Theorem 15.21 proved 
that Ti„ (pu is a curvelet molecule m™, which we will express as a superposition of curvelet 
atoms j o jUl 

Tl,po<PiM> = m Mo = ^A)(Mi>Mo)Pmi- 
Ml 

Second, for each /ii, Theorem 15.31 shows that T2 4L1 p t _ L1 is a molecule mu^s at the location 
h(ni). We are not exactly in that setting since in T2 jMo p Ml , the subscripts do not in gen- 
eral match. This does not pose any difficulty since Theorem 15.31 can be understood as a 
statement concerning general warpings <j>. We can define the map as induced by the 
transformation (x,£) — ► (y,rj) given by 

x = Vj;$(|/,^ ), 7/ = V 5 (y,<<^ ) 

(compare this with equation (|5.1|) ). Then, according to Theorem l5.3l T^uqPlh * s a molecule 
m ^ (Mi) at the location So 

(^2> T 2 lAt0 PMi) = A(M2, Vo(AH))- 

Hence, 

(¥Va>^Vw)) = J^^ 1 ^ 2 ' /l ^(/ i l))/ ? o(^l,/^o)- 

Ml 

Of course, both /?q and /?i obey very special decay properties. 

• By Theorem 15.21 and Lemma 031 |/?o(a*15 A*o)| < C« ■ ^(pi, Po)~ N f° r arbitrarily large 

> 0, provided that the selected atoms are regular enough. 

• By Theorem 15.31 and Lemma l2~3l |/3i(/i2, ^-// (Ml ) ) I — Cat ' <^(^2 ; Vo (Ml)) _JV f° r arbi- 
trarily large iV > 0, provided that the selected atoms are regular enough. 

Theorem 15.11 now follows from the observation that 

^u;(^ 2 , V>(w)) _iV • w (Ml,Mo) _JV < C n -w(/i 2 , V>(^o))^ (Ar_1) , (5.1) 

Ml 

This is an immediate consequence of properties 3 and 4 of the pseudo-distance lj, see 
proposition 12.21 

Cases involving coarse scale elements are treated similarly and we omit the proof. The 
boundedness from £ p to £ p for every p > follows from the same argument as in the proof 
of Theorem ll.il 

6 Discussion 

All along we specialized our discussion to the special case where the dimension of the spatial 
variable is n = 2. It is clear that nothing in our arguments depends upon this specific 
assumption. Indeed, we could just as well construct tight-frames of curvelets in arbitrary 
dimensions by smoothly partitioning the frequency plane into dyadic coronae, which would 
then be angularly localized near regions of sidelength length 2 3 in the radial direction and 
2 jf//2 in all the other directions; in order to this, we would use smooth partitions of the unit 
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sphere of M. n into spherical caps of radius about 2~ J / 2 . All of our analysis would apply as 
is, and would prove versions of Theorem II, II in arbitrary dimensions. 

Our main result assumes that the coefficients of the equation Q1.1J) be smooth. In many 
applications of interest, however, the coefficients may be smooth away from singular smooth 
surfaces. In geophysics for example, we typically have different layers with very different 
physical properties. A very important question would be to know how our analysis would 
adapt to this situation. In fact, it seems natural to believe that sparsity would continue to 
hold in this more general setting. Intuitively, the wave group would still be approximated 
by rigid motion along the Hamiltonian flow. Only, one would need to account for possible 
reflections/refractions. A curvelet hitting a singularity at a small angle of incidence would 
typically produce two curvelets, a reflected and a refracted curvelet. This is merely an 
intuition which one would need to justify by a careful analysis quantifying the behavior of 
a curvelet near the interface (here, the singular surface). We regard this type of question 
as an important extension to this work. 

The estimates proven in this paper are motivated by efforts towards applications as 
sparser expansions theoretically lead to better and faster algorithms. Our goal is to trans- 
form our theoretical insights into effective algorithms, and derive fast accurate solvers to 
certain classes of partial differential equations for which our methods have a comparative 
advantage. In this direction, suppose that a discretized version Ej^(t) of the curvelet ma- 
trix E{t) is known in the sense that it has been precomputed once for all. Assuming that 
discretized version inherits the sparsity from its continuous analog, then for each initial 
value problem, we would have available a fast algorithm for calculating the solution of the 
full wave equation. Indeed, Corollary 11.21 asserts that the truncated matrix trunc(i?jv(i)) 
obtained by keeping only (^(e 1 /" 1 ) terms per row obeys 

\\E N (t) -tvnnc(E N (t))\\ < e. 

Therefore, ignoring the cost of the digital curvelet transform, the total cost of an algorithm 
calculating the solution of a full wave equation to within accuracy e would be linear in 
the number of unknowns (number of voxels) and scale at most like C(e) • N. This is not 
the most practically relevant situation, however, as in general, one would not know E^(t). 
Work in progress shows that one can still design effective algorithms to 'build' the matrix 
E]y(t), and for all practical purposes compute the solution of the full wave equation for a 
given accuracy e in 0(N(log N) 2 ). 

Finally, we would like to conclude by pointing out that there is now considerable evidence 
that the curvelet transform is a very useful mathematical architecture; curvelets can do 
things that other classical systems simply cannot do. This paper proved that tight-frames 
of curvelets provide optimally sparse representations of large classes of linear systems of 
hyperbolic differential equations. But they also allow for optimally sparse representations 
of wavefront phenomena Further, they can also be useful in many different settings. 
For example, they have useful microlocal features which make them especially suitable 
for deployment in many inverse problems and especially limited-angle tomography [8]. In 
short, curvelets addresses a new range of problems, going beyond what traditional multiscale 
systems offer. 
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7 Appendix 

7.1 Additional proofs for section [5] 

Proof of proposition \2. 6 A 

These four properties were already formulated in |27| . although with a slightly weaker 
definition of pseudo-distance. Properties 1 and 2 are not proved in that reference, and 
property 3 is not extensively documented. We give the justification for these three results 
for completeness. 

1. We are to show that X d(fi',n). With e M = ^/|^|, this is 

|(e M , Ax) | + \Ax\ 2 + \A9\ 2 x |(e^, Ax)\ + \Ax\ 2 + |A0| 2 . 

It is sufficient to notice that 

\{e^,Ax)\ + \Ax\ 2 + \A9\ 2 ^\(e^,Ax)\ + \{e^,Ax)\ + \Ax\ 2 + \A6\ 2 . 

In order to justify the nontrivial inequality, use the law of cosines illustrated in Figure 
□ 

|<e M , Ax)| 2 + |< V , Ax)| 2 = sin 2 \A0\ (d 2 + d 2 ,) 

= sin |A0| |Ax| 2 ±2|( eAt ,Ax)| |(e M /,Ax)| cos I A#| 
< sin 2 |A0| |Ax| 2 + 2|(e At ,Ax)| | (e^ , Ax) \ . 

It follows that || (e M , Ax)\ - |(e M >, Ax)\\ < C • |A0||Ax| < C ■ {\A6\ 2 + \Ax\ 2 ) and, 
therefore, 

|(e M , Ax) | + |(v, As) | < C- (2|(e M , Ax)| + |A0| 2 + |Ax| 2 ). 



2. Recall that = 2^'l(l + 2 min ^")(i(/i, //)). Let us show that d(/x,|/) < C • 

(<i(/i, /x")+cZ(/i", //)). To simplify notations, set in the coordinates defined by {e^, e^}, 

x M = (0,0) ay = (xi,x 2 ) ay = (2/1,2/2) 

e M = (1,0) e^/ = (cos a, sin a) e^i = (cos /?, sin /?) 

|^-^| = |/3| |^'-^'| = |«-/?| 

It is enough to show that there exists e > such that 

e|xi| < |yi|+| cosa(xi-yi)+sin/3(x 2 -y2)|+(|/3|+|a-/?|)(|yi|+ki-2/i|+|y2|+k2-y2|), 

because then (|/3| + |a - /3|)(|j/i| + |a?i — yi| + \y 2 \ + \x 2 -y 2 \) < C • (|/?| 2 + |a-/3| 2 + 
1 7/1 1 2 -h | — yi| 2 + |y 2 | 2 + |x 2 — y 2 | 2 ). By contradiction let us assume that the inequality 
fails. Then we must have \y\\ < e|xi|. It is always true that |xi — y\\ + > |xi| 
so it is necessary that \(3\ + \a — j3\ < e. But then |a| < 2e thus cos a > 1 — 4e 2 
and |sina| < 2e. The term |cosa(xi — y±) + sin/3(x 2 — y 2 )\ is therefore always 
greater than (1 — 4e 2 )|xi — y\\ — e\x 2 — y 2 \- But this quantity must also be less 
than e|xi — yi\, otherwise its sum with \y\\ would exceed e|xi|. So we must have 
\ x 2 — J/2 1 > 1 " £ ~ 4<; |xi — But then the sum \y\\ + |xi — 2/i| + |x2 — y 2 \ must 
dominate which implies + |a — /3| < 2e 2 . By induction, a = (3 = and 
1 2/1 1 + \x\ — 2/1 1 > |xi| yields a contradiction. 
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Figure 6: Relative position and orientation of two curvelet molecules in x-space. The 
ellipses indicate their essential support. 

3. Let us show that £V uj(fj,o, ^i)~ N ■ ^2)~ N < Cat • w{hq, ^ 2 )~^- n ~ 1 ' > - We closely 
follow and expand the argument in [57]. We will need to use d((J,o, fj,±) x d(fj,i, fJ,o), as 
we have just showed. 

Define / w by 

hi '■= u(iJL2,liiY N ■ /xo) _JV (7.1) 

/ \ —N 

= (2\h-h\+\h-jo\^ + 2 min ^'«)d(// 2 ,/ii))(l + 2 min ^^)d(/i ,Mi))J • 

To ease notations, put temporarily a = 2 min ^°^\ a 2 = 2 min ^ 2 ' jl \ d i = d(/io,/ii), 
and d\2 = d(fi 2 , Hi)- We develop a lower bound on (l + a 2 (ii 2 )(l + aodoi) = l + a 2 <ii 2 + 
a o^oi + a 2^i2flo^oi- We make three simple observations: first, 

a 2 d 12 + a d 01 > min(a 2 , a )(di 2 + d i) = A , and <ii 2 + d i > C ■ d(fj, , fi 2 ); 

second, 

a 2 d 12 + a d i > max(a 2 d 12 , a d i) > max(o 2 , a ) min(d 12 , d 01 ) = B ; 
and third 

a 2 di 2 a d m = max(a 2 , a ) min(a 2 , a ) max(di 2 , d i) min(di 2 , d i) 
> max(a 2 , a ) min(a 2 , a ) min((i 12 , <i i) — ^TT~~ ~ = 

This gives 

1 + a 2 d 12 + a d 01 + a 2 d 12 a d 01 > - (1 + A + £ + ^ 5 ) > -(1 + A )(l + B ). 
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We replace the values of Aq, Bq by their expression, use the relation A$ > d{^,fi 2 ) 
and obtain 



< C ■ 2-^ 2 - jl ^°- j ^ N ■ (l + 2 mhl ^ jo ^d(fi 2 , Mo)J • (Li)~ N (7.2) 

with 

L x = l + max(2 min W 2 ^),2 min W '^))min(doi,di2)- 

Note that 

Li = min (l + max{2 min ^' jl \ 2 mi ^ jo ' j ^)d 12 , 1 + max(2 min ^ 2 ' J1 \ 2 vain ^ '^)d i 
> min f 1 + 2 min( - h ' jl U 12 , 1 + 2 min ^' Jl) (ioi N 



and, therefore, 

(Li)~ N < max ((1 + 2 min ^ 1 U 12 )- N , (1 + 2 min ^° ' jl U 01 y N ^ 
< (1 + 2 min ^ 2 '^d 12 )- N + (1 + 2 min ^°'^d 01 )- N . 

In the sequel we will repeatedly make use of the bound 

+ 2 9 d(/^ //))"* < C • 2 2 ^)+ , (7.3) 

M 

valid for N > 2, any real g and where the subscript + denotes the positive part. 
This is justified as follows. Without loss of generality, assume that // = (j',0,0) so 
that the curvelet j^i is nearly vertical and centered near the origin. We recall that 
AO = 7T • I ■ 2~ LJ/2J J = 0,1,... 2LJ/2J _ X) and Xfi = R^D^k, say. Then the left-hand 
side is 

Yl (l + 2 9 (|2- j/2 ^| 2 + |2- J '/ 2 A; 2 | 2 + |2- J 'A:i|)j . (7.4) 

For j > q this can be seen as a Riemann sum and bounded — up to a numerical 
multiplicative constant — by the corresponding integral 



-JV 



which in turn is less than C ■ 2 2 ^ q ^ provided N > 2. For j < q, the sum (|7.4|) 
essentially consists of a few terms, giving a O(l) contribution. This gives the bound 
C ■ 2 2 C?'~«)+. 

By symmetry, we can now assume jo < j2- Let us consider three cases. 

• < j2 < ji- In that case we have the bound 

(Li)~ N < C • [(1 + 2^d m y N + (1 + 2*^2)-^]. 

Summing this quantity over k\ and i.e., over all [i\ that correspond to a given 
ji , and using (|7.3|) , we obtain for ji > j 2 

53 J m < C - (1 + 2^42)"^ J] 2-f 2j " 1 - J ' 0- -' a > JV • 2 2 ^-^ 
mi ii>j2 

< c . 2 -0' 2 -io)JV (1 + 2 io do2 )-iV = C ■oo(^, l i 2 )- N . 
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• < ji < jo- We now have 

(L!)- N < C ■ [(1 + 2^d m y N + (1 + 2^d 12 )- N ]. 

According to (|7.3|) . the sum over k\ and of (Li)~ N is bounded by a constant 
independent of j\. The remaining sum is 

< C ■ 2-U°+^ N 22nN • (! + 2 il d 02 )^ JV . 
Mi ii < jo 

Observe that 2J' liV (l + 2J' 1 do 2 )~ iV < 2J' o7V (l + 2^°d 02 )~ N , therefore 

£> m < C- 2-C*»-J°)"(l + 2^)^ = C • ^(/xo,^)^. 
mi 

• jo <ji <j2- In that case we still have 

(L!)- N < C • [(1 + ^doi)-^ + (1 + 2*di 2 )-"]. 
summed over fei and l\ into a O(l) contribution. What remains is 
7 mi < C ■ 2-&-i°) N (l + 2^do 2 y N Yl 1 

Ml jo<jl<32 

We conclude by collecting the estimates corresponding to the three different cases. 
Remark that the loss of one (fractional) power of lo in the third case is unavoidable 
unless one modifies its definition in the spirit of [27] . This would however make 
notations unnecessarily heavy. 



4. See E3 p. 804. 



Proof of the inequality \2.1J$ . Assume without loss of generality that [i = /xq- We may 
express S^(^) as S '^(RabO > with A9 = 9^ — 9^. We begin by expressing the integral in 
polar coordinates, 

£i = rcos# (-RasOi = rcos(<9 + A0), 

6 = r sin (#Ae£)2 = r sin(# + A0) . 

As we can see, the cosine factor is not crucial and we may just as well drop it. Consequently, 
[ \SJZ)SA§\ n d£<C- ! rdr- 1 —r^w 

(■2-k 

x / dO[l + a\ sm9\]~ N [l + a'\ sin(6> + A9)\]~ N , 
Jo 

where a = ?. J J-' and a' = 2 3 J_J . This decoupling makes the problem of bounding the 
inner integral on the variable 9 tractable. For example when a > a' > 1, following |23j p. 56, 

/CO I -1 

d9[l + a\9\]- N [l + a'\9 + A9\]~ N < C ■ - 
-oo a [1 + a'|A0|J iV 

We get other estimates for other values and orderings of a and a' . The integral on r is then 
broken up into several pieces according to the values of a, a', j and f . It is straightforward 
to show that each of these contributions satisfies the inequality (|2.14j) . 
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7.2 Additional proofs for section [5] 

Proof of lemma I5.il 

By definition a^'(x) = 2~ 3jf//4 m At [D 2 -jRe ll x — k) and, therefore, 



oW(i) 



(^/)(a,0,6)a 3 / 4 2- 3 ^V( J Da^(^ 1 ^ (^ + fc) ~ 6) ^ 



— / (Rf)(a, 6, 6)| Al 1 ^ {A(x -{(3- k)) dfi, 

Oil i ./ 



(7.5) 



where A = D a RsD 2 j with 5 = 9 — 6^ and f3 = D 2 -jRe b. 

Let us first verify the assertion about the support of . Recall that over a cell Q M , 
[3 E [ki , k\ + 1) x [/c2, ^2 + 1), and hence for all b E Q^, we have 

Supp V (A{x - (0 - k))) C Suppip(Ax) + [0, l] 2 . 

Next Suppip(Ax) C ^ _1 [0, 1] 2 with A' 1 = D 2 - j R^ s D- 1 . It is not difficult to check that 
A _1 [0, l] 2 C [ci,c 2 ) x [di,d 2 ) which then gives ()5.15j) . 

There are several ways to prove the property about nearly vanishing moments. A possi- 
bility is to show that the Fourier transform of a**' is appropriately small in a neighborhood 
of the axis £i = 0. We choose a more direct strategy and show that 



i/>(A(x - I3))x\ dx x < C m ■ 2~ j( - m+1 \ 
uniformly over the (a,6,b) E Q^. The property (|5.1H|) follows from this fact. Indeed, 
i^\x ll x 2 )x\dx l = — [ Rf(a,6,b)dfjL [ \A\ l / 2 ^(A(x - p))x\ dx u 

«M JQu J 



(7.6) 



and the Cauchy-Schwarz inequality gives 

1 



a^(x\,X2)x\ dx-i 



< 



a, 



\\Rf\\L 2 ( Qll ) 



\A\ 1/2 ip(A(x -(5))x\dxi 

\ 1/2 



2 \ V2 
dfl 



\A\ l / 2 ^{A{x - (3))x\dxi 



d/i 



The uniform bound 1)7. 6|) together with the fact that Jq dfi is either 3ir or 3tt/2 gives 
(PHI) . 

We then need to establish (|7.6f) . Let 82 be 8/8x2, recall that by assumption (|5,8|) - ()5.9|) . 
we have that for all X2 E R, 

J d 2 ip(xi 1 x 2 )x\ dxi = 0, k = 0, 1, . . . ,R, 

and more generally, for each a/0 and (5 

8%i>{axi + (3,x 2 )x k l dx 1 = 0, k = 0, 1, . . . , R. (7.7) 

We shall use (fT7|) to prove (f7?6|) . Letting 

/ an &12 

«21 a 22 
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and with the same notations as before, a simple calculation shows that 021 = — 2 ^ n<5 - As 
a > 2-(J +1 ) and \S\ < vr/2 • 2~Li/ 2 J , we have 

|a 2 i| < c- 2~ j . 



(71 



We then write 

ip(Ax) 



i>(ai\Xx + 012^2, a 2 ixi + 022^2) 

ED n ip(a 1 ix 1 + ai 2 x 2 , a 2 2X 2 ) 21 / h 0((a 2 ixi' 
n! 

n=0 



A' 1 



and, therefore, 

i){Ax)x\ dxi = ^2^ f Dr VOiiZi + ai2X 2 , a22X 2 )^ +fc dx x + O(a^) 



/ 



JV-l 



n=0 



Fix k < D and pick N = D — /c + lso that for n = 0, 1, . . . , N — 1, n + k < D. By virtue 
of 1)7.7)1 all the integrals in the sum vanish and the only remaining term is O(a^) which 
because of C3J is 0(2^ N ). As a consequence, setting m = D/2, we conclude that 



ij)(Ax)x\ dx] 



< C m ■ 2- j( - m+1 \ jfc = 0,l,...m: 



this is the content of (|7.6|) . 

The careful reader will notice that inequality 1)7.6)1 or equivalently 1)5.16)1 is a weaker 
statement than inequality 1)2.6)1 for the definition of nearly vanishing moments. There is no 
doubt that the stronger estimate 1)2.6)1 also holds for curvelet atoms. The proof of this fact 
uses standard arguments and we choose not to reproduce it here. 

Last, the regularity property is a simple consequence of the Cauchy Schwarz inequality; 



a W (xi,x 2 ) < — /|i?/(a,M)ll^| 1/2 NUoo^ 
a,, ./ 



< 



■Il^/IU 2 (Q M ) 



1/2 



\A\dfj, 



2V3vr • 



These last inequalities used the facts that \A\ < 4 for (a, 9, b) £ and Jq d[i < 3-tt. Esti- 
mates for higher derivatives are obtained in exactly the same fashion-after differentiation 
of the integrand. This finishes the proof of the lemma. 



Proof of Urm . Recall that 



1/2 



\TZ(q(D) lfl )(a,b,e)\ 2 dfi 



The first thing to notice is that q(D)j^ is still a family of curvelet molecules, because (/(£) 
is a multiplier of order zero. Since i/j a ,e,b also obeys the molecule properties, lemma 12.31 
implies the corresponding almost-orthogonality condition. Integrating over Q^i does not 
compromise this estimate, as can be seen by applying the Cauchy-Schwarz inequality. 
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Proof of inequality ijg.jj)} . Derivatives of 7^ and a are treated using the following esti- 
mates. 

\d%%(0\ < C a ■ 2~ 3j / i 2- aij 2- a * j / 2 
|Sfo-(0- 1 (a?) j e)l < C« • 2- Hj on W M = supp^). 

We now develop size estimates for the phase perturbation 5. Following closely the discussion 
in [22], p. 407, we claim that on W^, 

\dgdg6(x,0\ < C af3 ■ 2- aij 2- a * j / 2 . (7.9) 

The derivations in x add no complications. Hence, assume that [3 = 0. As the above result 
(|7.9j) relies upon the homogeneity of the phase with respect to £, we recall a few useful facts 
about homogeneous functions of degree one: 

$ = • £ (Euler's theorem), 

• £ = (differentiate the above relation), 

= Od^l 1 -!"!) . 

It follows from the definition that 5(x,£i,0) = and likewise J|-(x,£i,0) = 0. Thus for 

every n, ^j£(x,£i,0) = and ^i(x, £1, 0) = 0. Recall that the support conditions are 

|£i| < C ■ 2 J and |£2| < C • 2 J / 2 . Taylor series expansions about £2 = together with 
homogeneity assumptions give 



r 5(x,£)=0(|£ 2 | 2 |£r 1 ^) = 0(2-^^ 



d d ai 



d ^(x,o = odbwtr 1 -* 1 ) = o(2- j/2 2-^), 

-5(x, £) = 0(|£| 1 - ai - Q2 ) = 0{2- a ^2~ a ^/ 2 ) when a 2 > 2, 



eg 2 eg 



as claimed. The point about these estimates is that they exhibit exactly the parabolic 
scaling of curvelets. We conclude 



and therefore (|5.2jh 
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